Machine learning based on SEER database to predict distant metastasis of thyroid cancer

Endocrine. 2023 Dec 29. doi: 10.1007/s12020-023-03657-4. Online ahead of print.

Abstract

Objective: Distant metastasis of thyroid cancer often indicates poor prognosis, and it is important to identify patients who have developed distant metastasis or are at high risk as early as possible. This paper aimed to predict distant metastasis of thyroid cancer through the construction of machine learning models to provide a reference for clinical diagnosis and treatment.

Materials & methods: Data on demographic and clinicopathological characteristics of thyroid cancer patients between 2010 and 2015 were extracted from the National Institutes of Health (NIH) Surveillance, Epidemiology, and End Results (SEER) database. Our research used univariate and multivariate logistic models to screen independent risk factors, respectively. Decision Trees (DT), ElasticNet (ENET), Logistic Regression (LR), Extreme Gradient Boosting (XGBoost), Random Forest (RF), Multilayer Perceptron (MLP), Radial Basis Function Support Vector Machine (RBFSVM) and seven machine learning models were compared and evaluated by the following metrics: the area under receiver operating characteristic curve (AUC), calibration curve, decision curve analysis (DCA), sensitivity(also called recall), specificity, precision, accuracy and F1 score. Interpretable machine learning was used to identify possible correlation between variables and distant metastasis.

Results: Independent risk factors for distant metastasis, including age, gender, race, marital status, histological type, capsular invasion, and number of lymph nodes metastases were screened by multifactorial regression analysis. Among the seven machine learning algorithms, RF was the best algorithm, with an AUC of 0.948, sensitivity of 0.919, accuracy of 0.845, and F1 score of 0.886 in the training set, and an AUC of 0.960, sensitivity of 0.929, accuracy of 0.906, and F1 score of 0.908 in the test set.

Conclusions: The machine learning model constructed in this study helps in the early diagnosis of distant thyroid metastases and helps physicians to make better decisions and medical interventions.

Keywords: Distant metastasis; Machine learning; SEER database; Thyroid cancer.