Noise sampling method: an ANOVA approach allowing robust selection of differentially regulated genes measured by DNA microarrays

Bioinformatics. 2003 Jul 22;19(11):1348-59. doi: 10.1093/bioinformatics/btg165.

Abstract

Motivation: A crucial step in microarray data analysis is the selection of subsets of interesting genes from the initial set of genes. In many cases, especially when comparing a specific condition to a reference, the genes of interest are those which are differentially expressed. Two common methods for gene selection are: (a) selection by fold difference (at least n fold variation) and (b) selection by altered ratio (at least n standard deviations away from the mean ratio).

Results: The novel method proposed here is based on ANOVA and uses replicate spots to estimate an empirical distribution of the noise. The measured intensity range is divided in a number of intervals. A noise distribution is constructed for each such interval. Bootstrapping is used to map the desired confidence levels from the noise distribution corresponding to a given interval to the measured log ratios in that interval. If the method is applied on individual arrays having replicate spots, the method can calculate an overall width of the noise distribution which can be used as an indicator of the array quality. We compared this method with the fold change and unusual ratio method. We also discuss the relationship with an ANOVA model proposed by Churchill et al. In silico experiments were performed while controlling the degree of regulation as well as the amount of noise. Such experiments show the performance of the classical methods can be very unsatisfactory. We also compared the results of the 2-fold method with the results of the noise sampling method using pre and post immortalization cell lines derived from the MDAH041 fibroblasts hybridized on Affymetrix GeneChip arrays. The 2-fold method reported 198 genes as upregulated and 493 genes as downregulated. The noise sampling method reported 98 gene upregulated and 240 genes downregulated at the 99.99% confidence level. The methods agreed on 221 genes downregulated and 66 genes upregulated. Fourteen genes from the subset of genes reported by both methods were all confirmed by Q-RT-PCR. Alternative assays on various subsets of genes on which the two methods disagreed suggested that the noise sampling method is likely to provide fewer false positives.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, U.S. Gov't, P.H.S.
  • Validation Study

MeSH terms

  • Algorithms*
  • Analysis of Variance
  • Computer Simulation
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation / genetics*
  • Humans
  • Li-Fraumeni Syndrome / genetics
  • Models, Genetic
  • Models, Statistical*
  • Oligonucleotide Array Sequence Analysis / methods*
  • Quality Control
  • Reproducibility of Results
  • Sample Size
  • Sensitivity and Specificity
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*
  • Sequence Homology, Nucleic Acid
  • Stochastic Processes