Machine learning and network-based models to identify genetic risk factors to the progression and survival of colorectal cancer

Comput Biol Med. 2021 Aug:135:104539. doi: 10.1016/j.compbiomed.2021.104539. Epub 2021 Jun 8.

Abstract

Colorectal cancer (CRC) is one of the most common and lethal malignant lesions. Determining how the identified risk factors drive the formation and development of CRC could be an essential means for effective therapeutic development. Aiming this, we investigated how the altered gene expression resulting from exposure to putative CRC risk factors contribute to prognostic biomarker identification. Differentially expressed genes (DEGs) were first identified for CRC and other eight risk factors. Gene set enrichment analysis (GSEA) through the molecular pathway and gene ontology (GO), as well as protein-protein interaction (PPI) network, were then conducted to predict the functions of these DEGs. Our identified genes were explored through the dbGaP and OMIM databases to compare with the already identified and known prognostic CRC biomarkers. The survival time of CRC patients was also examined using a Cox Proportional Hazard regression-based prognostic model by integrating transcriptome data from The Cancer Genome Atlas (TCGA). In this study, PPI analysis identified 4 sub-networks and 8 hub genes that may be potential therapeutic targets, including CXCL8, ICAM1, SOD2, CXCL2, CCL20, OIP5, BUB1, ASPM and IL1RN. We also identified seven signature genes (PRR5.ARHGAP8, CA7, NEDD4L, GFR2, ARHGAP8, SMTN, OIP5) in independent analysis and among which PRR5. ARHGAP8 was found in both multivariate analyses and in analyses that combined gene expression and clinical information. This approach provides both mechanistic information and, when combined with predictive clinical information, good evidence that the identified genes are significant biomarkers of processes involved in CRC progression and survival.

Keywords: Alcohol consumption; Colorectal cancer; Differentially expressed genes; Smoking; Survival analysis.

MeSH terms

  • Biomarkers, Tumor / genetics
  • Colorectal Neoplasms* / genetics
  • Cytoskeletal Proteins
  • Databases, Genetic
  • GTPase-Activating Proteins
  • Gene Expression Regulation, Neoplastic
  • Humans
  • Machine Learning
  • Muscle Proteins
  • Risk Factors
  • Transcriptome

Substances

  • ARHGAP8 protein, human
  • Biomarkers, Tumor
  • Cytoskeletal Proteins
  • GTPase-Activating Proteins
  • Muscle Proteins
  • SMTN protein, human