An Integrated Approach for Efficient Multi-Omics Joint Analysis

ACM BCB. 2019 Sep:2019:619-625. doi: 10.1145/3307339.3343476.

Abstract

The challenges associated with multi-omics analysis, e.g. DNA-seq, RNA-seq, metabolomics, methylomics and microbiomics domains, include: (1) increased high-dimensionality, as all -omics domains include ten thousands to hundreds of thousands of variables each; (2) increased complexity in analyzing domain-domain interactions, quadratic for pairwise correlation, and exponential for higher-order interactions; (3) variable heterogeneity, with highly skewed distributions in different units and scales for methylation and microbiome. Here, we developed an efficient strategy for joint-domain analysis, applying it to an analysis of correlations between colon epithelium methylomics and fecal microbiomics data with colorectal cancer risk as estimated by colorectal polyp prevalence. First, we applied domain-specific standard pipelines for quality assessment, cleaning, batch-effect removal, et cetera. Second, we performed variable homogenization for both the methylation and microbiome data sets, using domain-specific normalization and dimension reduction, obtaining scale-free variables that could be compared across the two domains. Finally, we implemented a joint-domain network analysis to identify relevant microbial-methylation island patterns. The network analysis considered all possible species-island pairs, thus being quadratic in its complexity. However, we were able to pre-select the unpaired variables by performing a preliminary association analysis on the outcome polyp prevalence. All results from association and interaction analyses were adjusted for multiple comparisons. Although the limited sample size did not provide good power (80% to detect medium to large effect sizes with 5% alpha error), a number of potentially significant association (dozens in the uncorrected analysis, reducing to just a few in the corrected one) were identified As a last step, we linked the network patterns identified by our approach to the KEGG functional ontology, showing that the method can generate new mechanistic hypotheses for the biological causes of polyp development.

Keywords: Methylation; bioinformatics; correlation; dimension reduction; joint analysis; microbiome; network; principal component analysis.