Machine learning algorithms are comparable to conventional regression models in predicting distant metastasis of follicular thyroid carcinoma

Clin Endocrinol (Oxf). 2023 Jan;98(1):98-109. doi: 10.1111/cen.14693. Epub 2022 Feb 25.

Abstract

Objective: Distant metastasis often indicates a poor prognosis, so early screening and diagnosis play a significant role. Our study aims to construct and verify a predictive model based on machine learning (ML) algorithms that can estimate the risk of distant metastasis of newly diagnosed follicular thyroid carcinoma (FTC).

Design: This was a retrospective study based on the Surveillance, Epidemiology, and End Results (SEER) database from 2004 to 2015.

Patients: A total of 5809 FTC patients were included in the data analysis. Among them, there were 214 (3.68%) cases with distant metastasis.

Method: Univariate and multivariate logistic regression (LR) analyses were used to determine independent risk factors. Seven commonly used ML algorithms were applied for predictive model construction. We used the area under the receiver-operating characteristic (AUROC) curve to select the best ML algorithm. The optimal model was trained through 10-fold cross-validation and visualized by SHapley Additive exPlanations (SHAP). Finally, we compared it with the traditional LR method.

Results: In terms of predicting distant metastasis, the AUROCs of the seven ML algorithms were 0.746-0.836 in the test set. Among them, the Extreme Gradient Boosting (XGBoost) had the best prediction performance, with an AUROC of 0.836 (95% confidence interval [CI]: 0.775-0.897). After 10-fold cross-validation, its predictive power could reach the best [AUROC: 0.855 (95% CI: 0.803-0.906)], which was slightly higher than the classic binary LR model [AUROC: 0.845 (95% CI: 0.818-0.873)].

Conclusions: The XGBoost approach was comparable to the conventional LR method for predicting the risk of distant metastasis for FTC.

Keywords: SHapley Additive exPlanations; XGBoost model; distant metastasis; follicular thyroid carcinoma; machine learning.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adenocarcinoma, Follicular*
  • Algorithms
  • Humans
  • Machine Learning
  • Retrospective Studies
  • Thyroid Neoplasms* / diagnosis