Treating expression levels of different genes as a sample in microarray data analysis: is it worth a risk?

Stat Appl Genet Mol Biol. 2006:5:Article9. doi: 10.2202/1544-6115.1185. Epub 2006 Mar 24.

Abstract

One of the prevailing ideas in the literature on microarray data analysis is to pool the expression measures across genes and treat them as a sample drawn from some distribution. Several universal laws were proposed to analytically describe this distribution. This idea raises a number of concerns. The expression levels of genes are not identically distributed random variables so that treating them as a sample amounts to sampling from a mixture of equally weighted distributions, each being associated with a different gene. The expression levels of different genes are heavily dependent random variables so that the law of large numbers and statistical goodness-of-fit tests are normally inapplicable to this kind of data. This dependence represents a very serious pitfall in microarray data analysis.

Publication types

  • Evaluation Study
  • Letter
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Gene Expression Profiling / methods*
  • Normal Distribution
  • Oligonucleotide Array Sequence Analysis / methods*
  • Risk
  • Stochastic Processes