A practical data processing workflow for multi-OMICS projects

Michael Kohl; Dominik A Megger; Martin Trippler; Hagen Meckel; Maike Ahrens; Thilo Bracht; Frank Weber; Andreas-Claudius Hoffmann; Hideo A Baba; Barbara Sitek; Jörg F Schlaak; Helmut E Meyer; Christian Stephan; Martin Eisenacher

doi:10.1016/j.bbapap.2013.02.029

A practical data processing workflow for multi-OMICS projects

Biochim Biophys Acta. 2014 Jan;1844(1 Pt A):52-62. doi: 10.1016/j.bbapap.2013.02.029. Epub 2013 Mar 15.

Authors

Michael Kohl¹, Dominik A Megger, Martin Trippler, Hagen Meckel, Maike Ahrens, Thilo Bracht, Frank Weber, Andreas-Claudius Hoffmann, Hideo A Baba, Barbara Sitek, Jörg F Schlaak, Helmut E Meyer, Christian Stephan, Martin Eisenacher

Affiliation

¹ Medizinisches Proteom-Center, Ruhr-Universitaet Bochum, Universitaetsstrasse 150, D-44801 Bochum, Germany. Electronic address: michael.kohl@rub.de.

PMID: 23501674
DOI: 10.1016/j.bbapap.2013.02.029

Abstract

Multi-OMICS approaches aim on the integration of quantitative data obtained for different biological molecules in order to understand their interrelation and the functioning of larger systems. This paper deals with several data integration and data processing issues that frequently occur within this context. To this end, the data processing workflow within the PROFILE project is presented, a multi-OMICS project that aims on identification of novel biomarkers and the development of new therapeutic targets for seven important liver diseases. Furthermore, a software called CrossPlatformCommander is sketched, which facilitates several steps of the proposed workflow in a semi-automatic manner. Application of the software is presented for the detection of novel biomarkers, their ranking and annotation with existing knowledge using the example of corresponding Transcriptomics and Proteomics data sets obtained from patients suffering from hepatocellular carcinoma. Additionally, a linear regression analysis of Transcriptomics vs. Proteomics data is presented and its performance assessed. It was shown, that for capturing profound relations between Transcriptomics and Proteomics data, a simple linear regression analysis is not sufficient and implementation and evaluation of alternative statistical approaches are needed. Additionally, the integration of multivariate variable selection and classification approaches is intended for further development of the software. Although this paper focuses only on the combination of data obtained from quantitative Proteomics and Transcriptomics experiments, several approaches and data integration steps are also applicable for other OMICS technologies. Keeping specific restrictions in mind the suggested workflow (or at least parts of it) may be used as a template for similar projects that make use of different high throughput techniques. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan.

Keywords: (X)PlatCom; ASCII; American Standard Code for Information Interchange; BC-FC; BG; Biomarker; Box–Cox-transformed fold changes; CRAN; CrossPlatformCommander; D(eucl); DIGE-LC-MS-Transcriptomics overlap; Data processing workflow; Euclidean distance; FC; GUI; HCC; HGNC; HUGO Gene Nomenclature Committee; KEGG; Kyoto Encyclopedia of Genes and Genomes; LD; MeSH; Medical Subject Headings; Multi-OMICS; OL(DLCT); Quantitative Proteomics; Quantitative Transcriptomics; RAID; Regression analysis; The Comprehensive R Archive Network; bio-molecule group; fold change; graphical user interface; hepatocellular carcinoma; liver disease; redundant array of independent disks.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Biomarkers / metabolism
Chromatography, Liquid
Mass Spectrometry
Proteomics*
Transcriptome*
Workflow*

Substances

Biomarkers