A New Distribution Family for Microarray Data

Microarrays (Basel). 2017 Feb 10;6(1):5. doi: 10.3390/microarrays6010005.

Abstract

The traditional approach with microarray data has been to apply transformations that approximately normalize them, with the drawback of losing the original scale. The alternative stand point taken here is to search for models that fit the data, characterized by the presence of negative values, preserving their scale; one advantage of this strategy is that it facilitates a direct interpretation of the results. A new family of distributions named gpower-normal indexed by p∈R is introduced and it is proven that these variables become normal or truncated normal when a suitable gpower transformation is applied. Expressions are given for moments and quantiles, in terms of the truncated normal density. This new family can be used to model asymmetric data that include non-positive values, as required for microarray analysis. Moreover, it has been proven that the gpower-normal family is a special case of pseudo-dispersion models, inheriting all the good properties of these models, such as asymptotic normality for small variances. A combined maximum likelihood method is proposed to estimate the model parameters, and it is applied to microarray and contamination data. Rcodes are available from the authors upon request.

Keywords: combinedmaximumlikelihoodestimators; data analysis; gpower-normal; microarrays; pseudo-dispersion models; truncated normal.