Integration of multi-omics data to mine cancer-related gene modules

J Bioinform Comput Biol. 2019 Dec;17(6):1950038. doi: 10.1142/S0219720019500380.

Abstract

The identification of cancer-related genes is a major research goal, with implications for determining the pathogenesis of cancer and identifying biomarkers for early diagnosis and treatment. In this study, by integrating multi-omics data, including gene expression, DNA copy number variation, DNA methylation, transcription factors, miRNA, and lncRNA data, we propose a method for mining cancer-related genes based on network models. First, using random forest-based feature selection method multi-omics data are integrated to identify key regulatory factors that affect gene expression, and then genome-wide regulatory networks are constructed. Next, by comparing the regulatory networks of key candidate genes in variant samples and non-variant samples, a differential expression regulatory network is generated. The differential network contains a collection of abnormal regulatory genes of key candidate genes. Then, by introducing the functional similarity as a distance metric for gene sets, a density-based clustering method is used to mine gene modules related to cancer. We applied this method to LUSC (lung squamous cell carcinoma) and mined cancer-related gene modules composed of 20 genes. GO function and KEGG pathway analyses indicated that the modules were closely related to cancer. A survival analysis was used to verify that the excavated gene modules can effectively distinguish between high- and low-risk groups. Overall, these results suggest that the proposed method can be used to identify cancer-related gene modules, providing a basis for the development of biomarkers for diagnosis and treatment.

Keywords: Multi-omics data; clustering; feature selection; gene regulation; network model.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Carcinoma, Squamous Cell / genetics
  • Carcinoma, Squamous Cell / mortality
  • Cluster Analysis
  • Computational Biology / methods*
  • DNA Copy Number Variations
  • Gene Expression Regulation, Neoplastic*
  • Genome, Human
  • Humans
  • Lung Neoplasms / genetics
  • Lung Neoplasms / mortality
  • MicroRNAs
  • Models, Biological
  • Neoplasms / genetics*
  • RNA, Long Noncoding
  • Random Allocation
  • Survival Analysis
  • Transcription Factors / genetics
  • Workflow

Substances

  • MicroRNAs
  • RNA, Long Noncoding
  • Transcription Factors