Machine learning models predict lymph node metastasis in patients with stage T1-T2 esophageal squamous cell carcinoma

Front Oncol. 2022 Sep 8:12:986358. doi: 10.3389/fonc.2022.986358. eCollection 2022.

Abstract

Background: For patients with stage T1-T2 esophageal squamous cell carcinoma (ESCC), accurately predicting lymph node metastasis (LNM) remains challenging. We aimed to investigate the performance of machine learning (ML) models for predicting LNM in patients with stage T1-T2 ESCC.

Methods: Patients with T1-T2 ESCC at three centers between January 2014 and December 2019 were included in this retrospective study and divided into training and external test sets. All patients underwent esophagectomy and were pathologically examined to determine the LNM status. Thirty-six ML models were developed using six modeling algorithms and six feature selection techniques. The optimal model was determined by the bootstrap method. An external test set was used to further assess the model's generalizability and effectiveness. To evaluate prediction performance, the area under the receiver operating characteristic curve (AUC) was applied.

Results: Of the 1097 included patients, 294 (26.8%) had LNM. The ML models based on clinical features showed good predictive performance for LNM status, with a median bootstrapped AUC of 0.659 (range: 0.592, 0.715). The optimal model using the naive Bayes algorithm with feature selection by determination coefficient had the highest AUC of 0.715 (95% CI: 0.671, 0.763). In the external test set, the optimal ML model achieved an AUC of 0.752 (95% CI: 0.674, 0.829), which was superior to that of T stage (0.624, 95% CI: 0.547, 0.701).

Conclusions: ML models provide good LNM prediction value for stage T1-T2 ESCC patients, and the naive Bayes algorithm with feature selection by determination coefficient performed best.

Keywords: esophageal squamous cell carcinoma; lymph node metastasis; machine learning; predictive model; stage T1-T2.