Laboratory blood parameters and machine learning for the prognosis of esophageal squamous cell carcinoma

Front Oncol. 2024 Apr 3:14:1367008. doi: 10.3389/fonc.2024.1367008. eCollection 2024.

Abstract

Background: In contemporary study, the death of esophageal squamous cell carcinoma (ESCC) patients need precise and expedient prognostic methodologies.

Objective: To develop and validate a prognostic model tailored to ESCC patients, leveraging the power of machine learning (ML) techniques and drawing insights from comprehensive datasets of laboratory-derived blood parameters.

Methods: Three ML approaches, including Gradient Boosting Machine (GBM), Random Survival Forest (RSF), and the classical Cox method, were employed to develop models on a dataset of 2521 ESCC patients with 27 features. The models were evaluated by concordance index (C-index) and time receiver operating characteristics (Time ROC) curves. We used the optimal model to evaluate the correlation between features and prognosis and divide patients into low- and high-risk groups by risk stratification. Its performance was analyzed by Kaplan-Meier curve and the comparison with AJCC8 stage. We further evaluate the comprehensive effectiveness of the model in ESCC subgroup by risk score and KDE (kernel density estimation) plotting.

Results: RSF's C-index (0.746) and AUC (three-year AUC 0.761, five-year AUC 0.771) had slight advantage over GBM and the classical Cox method. Subsequently, 14 features such as N stage, T stage, surgical margin, tumor length, age, Dissected LN number, MCH, Na, FIB, DBIL, CL, treatment, vascular invasion, and tumor grade were selected to build the model. Based on these, we found significant difference for survival rate between low-(3-year OS 81.8%, 5-year OS 69.8%) and high-risk (3-year OS 25.1%, 5-year OS 11.5%) patients in training set, which was also verified in test set (all P < 0.0001). Compared with the AJCC8th stage system, it showed a greater discriminative ability which is also in good agreement with its staging ability.

Conclusion: We developed an ESCC prognostic model with good performance by clinical features and laboratory blood parameters.

Keywords: esophageal squamous cell carcinoma; laboratory blood parameters; machine learning; prognosis; random survival forest.

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This research was funded by Chengdu Medical Association Research Project (2022417), Jianyang City People's Hospital Research Project (JY202234) and Sichuan Science and Technology Program (No.2022NSFSC1513).