Meta-analytic principal component analysis in integrative omics application

Bioinformatics. 2018 Apr 15;34(8):1321-1328. doi: 10.1093/bioinformatics/btx765.

Abstract

Motivation: With the prevalent usage of microarray and massively parallel sequencing, numerous high-throughput omics datasets have become available in the public domain. Integrating abundant information among omics datasets is critical to elucidate biological mechanisms. Due to the high-dimensional nature of the data, methods such as principal component analysis (PCA) have been widely applied, aiming at effective dimension reduction and exploratory visualization.

Results: In this article, we combine multiple omics datasets of identical or similar biological hypothesis and introduce two variations of meta-analytic framework of PCA, namely MetaPCA. Regularization is further incorporated to facilitate sparse feature selection in MetaPCA. We apply MetaPCA and sparse MetaPCA to simulations, three transcriptomic meta-analysis studies in yeast cell cycle, prostate cancer, mouse metabolism and a TCGA pan-cancer methylation study. The result shows improved accuracy, robustness and exploratory visualization of the proposed framework.

Availability and implementation: An R package MetaPCA is available online. (http://tsenglab.biostat.pitt.edu/software.htm).

Contact: ctseng@pitt.edu.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • DNA Methylation
  • Gene Expression Profiling / methods*
  • Genomics / methods*
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Male
  • Meta-Analysis as Topic*
  • Mice
  • Neoplasms / genetics
  • Neoplasms / metabolism
  • Principal Component Analysis / methods*
  • Software*
  • Yeasts / physiology