Computing and graphing probability values of pearson distributions: a SAS/IML macro

Source Code Biol Med. 2019 Dec 20:14:6. doi: 10.1186/s13029-019-0076-2. eCollection 2019.

Abstract

Background: Any empirical data can be approximated to one of Pearson distributions using the first four moments of the data (Elderton WP, Johnson NL. Systems of Frequency Curves. 1969; Pearson K. Philos Trans R Soc Lond Ser A. 186:343-414 1895; Solomon H, Stephens MA. J Am Stat Assoc. 73(361):153-60 1978). Thus, Pearson distributions made statistical analysis possible for data with unknown distributions. There are both extant, old-fashioned in-print tables (Pearson ES, Hartley HO. Biometrika Tables for Statisticians, vol. II. 1972) and contemporary computer programs (Amos DE, Daniel SL. Tables of percentage points of standardized pearson distributions. 1971; Bouver H, Bargmann RE. Tables of the standardized percentage points of the pearson system of curves in terms of β 1 and β 2. 1974; Bowman KO, Shenton LR. Biometrika. 66(1):147-51 1979; Davis CS, Stephens MA. Appl Stat. 32(3):322-7 1983; Pan W. J Stat Softw. 31(Code Snippet 2):1-6 2009) available for obtaining percentage points of Pearson distributions corresponding to certain pre-specified percentages (or probability values; e.g., 1.0%, 2.5%, 5.0%, etc.), but they are little useful in statistical analysis because we have to rely on unwieldy second difference interpolation to calculate a probability value of a Pearson distribution corresponding to a given percentage point, such as an observed test statistic in hypothesis testing.

Results: The present study develops a SAS/IML macro program to identify the appropriate type of Pearson distribution based on either input of dataset or the values of four moments and then compute and graph probability values of Pearson distributions for any given percentage points.

Conclusions: The SAS macro program returns accurate approximations to Pearson distributions and can efficiently facilitate researchers to conduct statistical analysis on data with unknown distributions.

Keywords: Curve fitting; Distribution-free statistics; Hypothesis testing; Pearson distributions.