Integrated data analysis for genome-wide research

EXS. 2007:97:309-29. doi: 10.1007/978-3-7643-7439-6_13.

Abstract

Integrated data analysis is introduced as the intermediate level of a systems biology approach to analyse different 'omics' datasets, i.e., genome-wide measurements of transcripts, protein levels or protein-protein interactions, and metabolite levels aiming at generating a coherent understanding of biological function. In this chapter we focus on different methods of correlation analyses ranging from simple pairwise correlation to kernel canonical correlation which were recently applied in molecular biology. Several examples are presented to illustrate their application. The input data for this analysis frequently originate from different experimental platforms. Therefore, preprocessing steps such as data normalisation and missing value estimation are inherent to this approach. The corresponding procedures, potential pitfalls and biases, and available software solutions are reviewed. The multiplicity of observations obtained in omics-profiling experiments necessitates the application of multiple testing correction techniques.

Publication types

  • Review

MeSH terms

  • Cluster Analysis
  • Genome*
  • Genomics / statistics & numerical data
  • Principal Component Analysis
  • Proteomics / statistics & numerical data
  • Software