Machine Learning Predicts the Oxidative Stress Subtypes Provide an Innovative Insight into Colorectal Cancer

Oxid Med Cell Longev. 2023 Apr 21:2023:1737501. doi: 10.1155/2023/1737501. eCollection 2023.

Abstract

So far, it has been reached the academic consensus that the molecular subtypes are via genomic heterogeneity and immune infiltration patterns. Considering that oxidative stress (OS) is involved in tumorigenesis and prognosis prediction, we propose an innovative classification of colorectal cancer- (CRC-) OS subtypes. We obtain three datasets from The Cancer Genome Atlas Program (TCGA) and Gene Expression Omnibus (GEO) online databases. 1399 OS-related genes were selected from the GeneCards database. We remove the batch effect before conducting differentially expressed genes (DEGs) analyses between normal and tumor samples. Nonnegative matrix factorization (NMF) was used to perform an unsupervised cluster. Lasso regression and Cox regression were used to construct the signature model. DEGs, robust rank aggregation, and protein-protein interaction networks were used to select hub genes, and then use hub genes to predict OS subtypes by random forest algorithms. NMF identifies two OS-related subtypes of CRC patients. Eight OS-related gene signatures were built to predict the outcome of patients, based on the DEGs between two subtypes. A total of 61 DEGs overlap each dataset, and the RRA analysis shows that 17 genes are important in these three datasets, and 15 genes are shared genes between the two methods. PPI network suggests that five hub genes are confirmed, they are SPP1, SERPINE1, CAV1, PDGFRB, and PLAU. These five hub genes could predict the OS-related subtype of CRC accurately with AUC equal to 0.771. In our study, we identify two OS-related subtypes, which will provide an innovative insight into colorectal cancer.

MeSH terms

  • Algorithms
  • Carcinogenesis
  • Colorectal Neoplasms* / genetics
  • Humans
  • Machine Learning*
  • Oxidative Stress / genetics