Statistical analysis principles for Omics data

Methods Mol Biol. 2011:719:113-31. doi: 10.1007/978-1-61779-027-0_5.

Abstract

In Omics experiments, typically thousands of hypotheses are tested simultaneously, each based on very few independent replicates. Traditional tests like the t-test were shown to perform poorly with this new type of data. Furthermore, simultaneous consideration of many hypotheses, each prone to a decision error, requires powerful adjustments for this multiple testing situation. After a general introduction to statistical testing, we present the moderated t-statistic, the SAM statistic, and the RankProduct statistic which have been developed to evaluate hypotheses in typical Omics experiments. We also provide an introduction to the multiple testing problem and discuss some state-of-the-art procedures to address this issue. The presented test statistics are subjected to a comparative analysis of a microarray experiment comparing tissue samples of two groups of tumors. All calculations can be done using the freely available statistical software R. Accompanying, commented code is available at: http://www.meduniwien.ac.at/msi/biometrie/MIMB.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods*
  • Computational Biology / standards
  • Data Display
  • Data Interpretation, Statistical*
  • Gene Expression Profiling
  • Humans
  • Information Management
  • Neoplasms / genetics
  • Oligonucleotide Array Sequence Analysis
  • Quality Control