Meta-analytic principal component analysis in integrative omics application

SungHwan Kim; Dongwan Kang; Zhiguang Huo; Yongseok Park; George C Tseng

doi:10.1093/bioinformatics/btx765

Meta-analytic principal component analysis in integrative omics application

Bioinformatics. 2018 Apr 15;34(8):1321-1328. doi: 10.1093/bioinformatics/btx765.

Authors

SungHwan Kim¹, Dongwan Kang², Zhiguang Huo³, Yongseok Park³, George C Tseng^{3

4}

Affiliations

¹ Department of Statistics, Keimyung University, Daegu 42601, South Korea.
² Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
³ Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261, USA.
⁴ Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15213, USA.

Abstract

Motivation: With the prevalent usage of microarray and massively parallel sequencing, numerous high-throughput omics datasets have become available in the public domain. Integrating abundant information among omics datasets is critical to elucidate biological mechanisms. Due to the high-dimensional nature of the data, methods such as principal component analysis (PCA) have been widely applied, aiming at effective dimension reduction and exploratory visualization.

Results: In this article, we combine multiple omics datasets of identical or similar biological hypothesis and introduce two variations of meta-analytic framework of PCA, namely MetaPCA. Regularization is further incorporated to facilitate sparse feature selection in MetaPCA. We apply MetaPCA and sparse MetaPCA to simulations, three transcriptomic meta-analysis studies in yeast cell cycle, prostate cancer, mouse metabolism and a TCGA pan-cancer methylation study. The result shows improved accuracy, robustness and exploratory visualization of the proposed framework.

Availability and implementation: An R package MetaPCA is available online. (http://tsenglab.biostat.pitt.edu/software.htm).

Contact: ctseng@pitt.edu.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Animals
DNA Methylation
Gene Expression Profiling / methods*
Genomics / methods*
High-Throughput Nucleotide Sequencing
Humans
Male
Meta-Analysis as Topic*
Mice
Neoplasms / genetics
Neoplasms / metabolism
Principal Component Analysis / methods*
Software*
Yeasts / physiology

Grants and funding

R01 CA190766/CA/NCI NIH HHS/United States