Novel statistical framework to identify differentially expressed genes allowing transcriptomic background differences

Bioinformatics. 2010 Jun 1;26(11):1431-6. doi: 10.1093/bioinformatics/btq163. Epub 2010 Apr 16.

Abstract

Motivation: Tests of differentially expressed genes (DEGs) from microarray experiments are based on the null hypothesis that genes that are irrelevant to the phenotype/stimulus are expressed equally in the target and control samples. However, this strict hypothesis is not always true, as there can be several transcriptomic background differences between target and control samples, including different cell/tissue types, different cell cycle stages and different biological donors. These differences lead to increased false positives, which have little biological/medical significance.

Result: In this article, we propose a statistical framework to identify DEGs between target and control samples from expression microarray data allowing transcriptomic background differences between these samples by introducing a modified null hypothesis that the gene expression background difference is normally distributed. We use an iterative procedure to perform robust estimation of the null hypothesis and identify DEGs as outliers. We evaluated our method using our own triplicate microarray experiment, followed by validations with reverse transcription-polymerase chain reaction (RT-PCR) and on the MicroArray Quality Control dataset. The evaluations suggest that our technique (i) results in less false positive and false negative results, as measured by the degree of agreement with RT-PCR of the same samples, (ii) can be applied to different microarray platforms and results in better reproducibility as measured by the degree of DEG identification concordance both intra- and inter-platforms and (iii) can be applied efficiently with only a few microarray replicates. Based on these evaluations, we propose that this method not only identifies more reliable and biologically/medically significant DEG, but also reduces the power-cost tradeoff problem in the microarray field.

Availability: Source code and binaries freely available for download at http://comonca.org.cn/fdca/resources/softwares/deg.zip.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biomarkers, Tumor / metabolism
  • Cell Line, Tumor
  • Gene Expression Profiling / methods*
  • Humans
  • Models, Statistical*
  • Oligonucleotide Array Sequence Analysis / methods
  • Reverse Transcriptase Polymerase Chain Reaction

Substances

  • Biomarkers, Tumor