An External-Validated Prediction Model to Predict Lung Metastasis among Osteosarcoma: A Multicenter Analysis Based on Machine Learning

Comput Intell Neurosci. 2022 May 6:2022:2220527. doi: 10.1155/2022/2220527. eCollection 2022.

Abstract

Background: Lung metastasis greatly affects medical therapeutic strategies in osteosarcoma. This study aimed to develop and validate a clinical prediction model to predict the risk of lung metastasis among osteosarcoma patients based on machine learning (ML) algorithms.

Methods: We retrospectively collected osteosarcoma patients from the Surveillance Epidemiology and End Results (SEER) database and from four hospitals in China. Six ML algorithms, including logistic regression (LR), gradient boosting machine (GBM), extreme gradient boosting (XGBoost), random forest (RF), decision tree (DT), and multilayer perceptron (MLP), were applied to build predictive models for predicting lung metastasis using patient's demographics, clinical characteristics, and therapeutic variables from the SEER database. The model was internally validated using 10-fold cross-validation to calculate the mean area under the curve (AUC) and the model was externally validated using the Chinese multicenter osteosarcoma data. Relative importance ranking of predictors was plotted to understand the importance of each predictor in different ML algorithms. The correlation heat map of predictors was plotted to understand the correlation of each predictor, selecting the 10-fold cross-validation with the highest AUC value in the external validation ROC curve to build a web calculator.

Results: Of all enrolled patients from the SEER database, 17.73% (194/1094) developed lung metastasis. The multiple logistic regression analysis showed that sex, N stage, T stage, surgery, and bone metastasis were all independent risk factors for lung metastasis. In predicting lung metastasis, the mean AUCs of the six ML algorithms ranged from 0.711 to 0.738 in internal validation and 0.697 to 0.729 in external validation. Among the six ML algorithms, the extreme gradient boosting (XGBoost) model had the highest AUC value with an average internal AUC of 0.738 and an external AUC of 0.729. The best performing ML algorithm model was used to build a web calculator to facilitate clinicians to calculate the risk of lung metastasis for each patient.

Conclusions: The XGBoost model may have the best prediction effect and the online calculator based on this model can help doctors to determine the lung metastasis risk of osteosarcoma patients and help to make individualized medical strategies.

Publication types

  • Multicenter Study

MeSH terms

  • Bone Neoplasms*
  • Humans
  • Lung Neoplasms* / diagnosis
  • Machine Learning
  • Models, Statistical
  • Osteosarcoma*
  • Prognosis
  • Retrospective Studies