A Machine Learning-Based Predictive Model for Predicting Lymph Node Metastasis in Patients With Ewing's Sarcoma

Front Med (Lausanne). 2022 Apr 6:9:832108. doi: 10.3389/fmed.2022.832108. eCollection 2022.

Abstract

Objective: In order to provide reference for clinicians and bring convenience to clinical work, we seeked to develop and validate a risk prediction model for lymph node metastasis (LNM) of Ewing's sarcoma (ES) based on machine learning (ML) algorithms.

Methods: Clinicopathological data of 923 ES patients from the Surveillance, Epidemiology, and End Results (SEER) database and 51 ES patients from multi-center external validation set were retrospectively collected. We applied ML algorithms to establish a risk prediction model. Model performance was checked using 10-fold cross-validation in the training set and receiver operating characteristic (ROC) curve analysis in external validation set. After determining the best model, a web-based calculator was made to promote the clinical application.

Results: LNM was confirmed or unable to evaluate in 13.86% (135 out of 974) ES patients. In multivariate logistic regression, race, T stage, M stage and lung metastases were independent predictors for LNM in ES. Six prediction models were established using random forest (RF), naive Bayes classifier (NBC), decision tree (DT), xgboost (XGB), gradient boosting machine (GBM), logistic regression (LR). In 10-fold cross-validation, the average area under curve (AUC) ranked from 0.705 to 0.764. In ROC curve analysis, AUC ranged from 0.612 to 0.727. The performance of the RF model ranked best. Accordingly, a web-based calculator was developed (https://share.streamlit.io/liuwencai2/es_lnm/main/es_lnm.py).

Conclusion: With the help of clinicopathological data, clinicians can better identify LNM in ES patients. Risk prediction models established in this study performed well, especially the RF model.

Keywords: Ewing sarcoma; SEER; lymph node metastasis; machine learning; multi-center; web calculator.