Machine learning model based on enhanced CT radiomics for the preoperative prediction of lymphovascular invasion in esophageal squamous cell carcinoma

Front Oncol. 2024 Feb 23:14:1308317. doi: 10.3389/fonc.2024.1308317. eCollection 2024.

Abstract

Objective: To evaluate the value of a machine learning model using enhanced CT radiomics features in the prediction of lymphovascular invasion (LVI) of esophageal squamous cell carcinoma (ESCC) before treatment.

Methods: We reviewed and analyzed the enhanced CT images of 258 ESCC patients from June 2017 to December 2019. We randomly assigned the patients in a ratio of 7:3 to a training set (182 cases) and a validation (76 cases) set. Clinical risk factors and CT image characteristics were recorded, and multifactor logistic regression was used to screen independent risk factors of LVI of ESCC patients. We extracted the CT radiomics features using the FAE software and screened radiomics features using maximum relevance and minimum redundancy (MRMR) and least absolute shrinkage and selection operator (LASSO) algorithms, and finally, the radiomics labels of each patient were established. Five machine learning algorithms, namely, support vector machine (SVM), K-nearest neighbor (KNN), logistic regression (LR), Gauss naive Bayes (GNB), and multilayer perceptron (MLP), were used to construct the model of radiomics labels, and its clinical features were screened. The predictive efficacy of the machine learning model for LVI of ESCC was evaluated using the receiver operating characteristic (ROC) curve.

Results: Tumor thickness [OR = 1.189, 95% confidence interval (CI) 1.060-1.351, P = 0.005], tumor-to-normal wall enhancement ratio (TNR) (OR = 2.966, 95% CI 1.174-7.894, P = 0.024), and clinical N stage (OR = 5.828, 95% CI 1.752-20.811, P = 0.005) were determined as independent risk factors of LVI. We extracted 1,316 features from preoperative enhanced CT images and selected 14 radiomics features using MRMR and LASSO to construct the radiomics labels. In the test set, SVM, KNN, LR, and GNB showed high predictive performance, while the MLP model had poor performance. In the training set, the area under the curve (AUC) values were 0.945 and 0.905 in the KNN and SVM models, but these decreased to 0.866 and 0.867 in the validation set, indicating significant overfitting. The GNB and LR models had AUC values of 0.905 and 0.911 in the training set and 0.900 and 0.893 in the validation set, with stable performance and good fitting and predictive ability. The MLP model had AUC values of 0.658 and 0.674 in the training and validation sets, indicating poor performance. A multiscale combined model constructed using multivariate logistic regression has an AUC of 0.911 (0.870-0.951) and 0.893 (0.840-0.962), accuracy of 84.4% and 79.7%, sensitivity of 90.8% and 87.1%, and specificity of 80.5% and 79.0% in the training and validation sets, respectively.

Conclusion: Machine learning models can preoperatively predict the condition of LVI effectively in patients with ESCC based on enhanced CT radiomics features. The GNB and LR models exhibit good stability and may bring a new way for the non-invasive prediction of LVI condition in ESCC patients before treatment.

Keywords: ESCC; computed tomography; lymphovascular invasion; machine learning; radiomics.

Grants and funding

The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.