Regularized survival learning and cross-database analysis enabled identification of colorectal cancer prognosis-related immune genes

Front Genet. 2023 Feb 23:14:1148470. doi: 10.3389/fgene.2023.1148470. eCollection 2023.

Abstract

Colon adenocarcinoma is the most common type of colorectal cancer. The prognosis of advanced colorectal cancer patients who received treatment is still very poor. Therefore, identifying new biomarkers for prognosis prediction has important significance for improving treatment strategies. However, the power of biomarker analyses was limited by the used sample size of individual database. In this study, we combined Genotype-Tissue Expression (GTEx) and The Cancer Genome Atlas (TCGA) databases to expand the number of healthy tissue samples. We screened differentially expressed genes between the GTEx healthy samples and TCGA tumor samples. Subsequently, we applied least absolute shrinkage and selection operator (LASSO) regression and multivariate Cox analysis to identify nine prognosis-related immune genes: ANGPTL4, IDO1, NOX1, CXCL3, LTB4R, IL1RL2, CD72, NOS2, and NUDT6. We computed the risk scores of samples based on the expression levels of these genes and divided patients into high- and low-risk groups according to this risk score. Survival analysis results showed a significant difference in survival rate between the two risk groups. The high-risk group had a significantly lower overall survival rate and poorer prognosis. We found the receiver operating characteristic based on the risk score was showed to accurately predict patients' prognosis. These prognosis-related immune genes may be potential biomarkers for colorectal cancer diagnosis and treatment. Our open-source code is freely available from GitHub at https://github.com/gutmicrobes/Prognosis-model.git.

Keywords: LASSO; colorectal cancer; immune gene; multivariate cox analysis; prognosis.

Grants and funding

This research was funded by grants from the National Natural Science Foundation of China (grant number 61873027) and Open Project of the National Engineering Laboratory for Agri-product Quality Traceability (No.AQT-2020-YB6).