HCNM: Heterogeneous Correlation Network Model for Multi-level Integrative Study of Multi-omics Data for Cancer Subtype Prediction

Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov:2021:1880-1886. doi: 10.1109/EMBC46164.2021.9630781.

Abstract

Integrative analysis of multi-omics data is important for biomedical applications, as it is required for a comprehensive understanding of biological function. Integrating multi-omics data serves multiple purposes, such as, an integrated data model, dimensionality reduction of omic features, patient clustering, etc. For oncological data, patient clustering is synonymous to cancer subtype prediction. However, there is a gap in combining some of the widely used integrative analyses to build more powerful tools. To bridge the gap, we propose a multi-level integration algorithm to identify representative integrative subspace and use it for cancer subtype prediction. The three integrative approaches we implement on multi-omics features are, (1) multivariate multiple (linear) regression of the features from a cohort of patients/samples, (2) network construction using different omics features, and (3) fusion of sample similarity networks across the features. We use a type of multilayer network, called heterogeneous network, as a data model to transition between a network-free (NF) regression model and a network-based (NB) model, which uses correlation networks. The heterogeneous networks consist of intra- and inter-layer graphs. Our proposed heterogeneous correlation network model, HCNM, is central to our algorithm for gene-ranking, integrative subspace identification, and tumor-specific subtypes prediction. The genes of our representative integrative subspace have been enriched with gene-ontology and found to exhibit significant gene-disease association (GDA) scores. The subspace in genes which is less than 5% of the total gene-set of each genomic feature is used with NB fusion integrative model to predict sample subtypes. As the identified integrative subspace data of multi-omics is less prone to noise, bias, and outliers, our experiments show that the subtypes in our results agree with previous benchmark studies and exhibit better classification between poor and good survival of patient cohorts.Clinical relevance: Finding significant cancer-specific genes and subtypes of cancer is vital for early prognosis, and personalized treatment; therefore, improves survival probability of a patient.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Cluster Analysis
  • Genomics*
  • Humans
  • Neoplasms* / genetics