Statistical analysis principles for Omics data

Daniela Dunkler; Fátima Sánchez-Cabo; Georg Heinze

doi:10.1007/978-1-61779-027-0_5

Statistical analysis principles for Omics data

Methods Mol Biol. 2011:719:113-31. doi: 10.1007/978-1-61779-027-0_5.

Authors

Daniela Dunkler¹, Fátima Sánchez-Cabo, Georg Heinze

Affiliation

¹ Section of Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Vienna, Austria.

PMID: 21370081
DOI: 10.1007/978-1-61779-027-0_5

Abstract

In Omics experiments, typically thousands of hypotheses are tested simultaneously, each based on very few independent replicates. Traditional tests like the t-test were shown to perform poorly with this new type of data. Furthermore, simultaneous consideration of many hypotheses, each prone to a decision error, requires powerful adjustments for this multiple testing situation. After a general introduction to statistical testing, we present the moderated t-statistic, the SAM statistic, and the RankProduct statistic which have been developed to evaluate hypotheses in typical Omics experiments. We also provide an introduction to the multiple testing problem and discuss some state-of-the-art procedures to address this issue. The presented test statistics are subjected to a comparative analysis of a microarray experiment comparing tissue samples of two groups of tumors. All calculations can be done using the freely available statistical software R. Accompanying, commented code is available at: http://www.meduniwien.ac.at/msi/biometrie/MIMB.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Computational Biology / methods*
Computational Biology / standards
Data Display
Data Interpretation, Statistical*
Gene Expression Profiling
Humans
Information Management
Neoplasms / genetics
Oligonucleotide Array Sequence Analysis
Quality Control