Integrated data analysis for genome-wide research

Matthias Steinfath; Dirk Repsilber; Matthias Scholz; Dirk Walther; Joachim Selbig

doi:10.1007/978-3-7643-7439-6_13

Integrated data analysis for genome-wide research

EXS. 2007:97:309-29. doi: 10.1007/978-3-7643-7439-6_13.

Authors

Matthias Steinfath¹, Dirk Repsilber, Matthias Scholz, Dirk Walther, Joachim Selbig

Affiliation

¹ Institute for Biology and Biochemistry, University Potsdam, c/o MPI-MP Am Mühlenberg 1, D-14476 Potsdam-Golm, Germany. steinfath@mpimp-golm.mpg.de

PMID: 17432273
DOI: 10.1007/978-3-7643-7439-6_13

Abstract

Integrated data analysis is introduced as the intermediate level of a systems biology approach to analyse different 'omics' datasets, i.e., genome-wide measurements of transcripts, protein levels or protein-protein interactions, and metabolite levels aiming at generating a coherent understanding of biological function. In this chapter we focus on different methods of correlation analyses ranging from simple pairwise correlation to kernel canonical correlation which were recently applied in molecular biology. Several examples are presented to illustrate their application. The input data for this analysis frequently originate from different experimental platforms. Therefore, preprocessing steps such as data normalisation and missing value estimation are inherent to this approach. The corresponding procedures, potential pitfalls and biases, and available software solutions are reviewed. The multiplicity of observations obtained in omics-profiling experiments necessitates the application of multiple testing correction techniques.

Publication types

Review

MeSH terms

Cluster Analysis
Genome*
Genomics / statistics & numerical data
Principal Component Analysis
Proteomics / statistics & numerical data
Software