Integration of machine learning for developing a prognostic signature related to programmed cell death in colorectal cancer

Environ Toxicol. 2024 May;39(5):2908-2926. doi: 10.1002/tox.24157. Epub 2024 Feb 1.

Abstract

Background: Colorectal cancer (CRC) presents a significant global health burden, characterized by a heterogeneous molecular landscape and various genetic and epigenetic alterations. Programmed cell death (PCD) plays a critical role in CRC, offering potential targets for therapy by regulating cell elimination processes that can suppress tumor growth or trigger cancer cell resistance. Understanding the complex interplay between PCD mechanisms and CRC pathogenesis is crucial. This study aims to construct a PCD-related prognostic signature in CRC using machine learning integration, enhancing the precision of CRC prognosis prediction.

Method: We retrieved expression data and clinical information from the Cancer Genome Atlas and Gene Expression Omnibus (GEO) datasets. Fifteen forms of PCD were identified, and corresponding gene sets were compiled. Machine learning algorithms, including Lasso, Ridge, Enet, StepCox, survivalSVM, CoxBoost, SuperPC, plsRcox, random survival forest (RSF), and gradient boosting machine, were integrated for model construction. The models were validated using six GEO datasets, and the programmed cell death score (PCDS) was established. Further, the model's effectiveness was compared with 109 transcriptome-based CRC prognostic models.

Result: Our integrated model successfully identified differentially expressed PCD-related genes and stratified CRC samples into four subtypes with distinct prognostic implications. The optimal combination of machine learning models, RSF + Ridge, showed superior performance compared with traditional methods. The PCDS effectively stratified patients into high-risk and low-risk groups, with significant survival differences. Further analysis revealed the prognostic relevance of immune cell types and pathways associated with CRC subtypes. The model also identified hub genes and drug sensitivities relevant to CRC prognosis.

Conclusion: The current study highlights the potential of integrating machine learning models to enhance the prediction of CRC prognosis. The developed prognostic signature, which is related to PCD, holds promise for personalized and effective therapeutic interventions in CRC.

Keywords: colorectal cancer (CRC); experimental validation; machine learning algorithms; prognostic signature; programmed cell death (PCD).

MeSH terms

  • Apoptosis*
  • Colorectal Neoplasms* / genetics
  • Humans
  • Machine Learning
  • Prognosis