Machine learning for identifying benign and malignant of thyroid tumors: A retrospective study of 2,423 patients

Front Public Health. 2022 Sep 14:10:960740. doi: 10.3389/fpubh.2022.960740. eCollection 2022.

Abstract

Thyroid tumors, one of the common tumors in the endocrine system, while the discrimination between benign and malignant thyroid tumors remains insufficient. The aim of this study is to construct a diagnostic model of benign and malignant thyroid tumors, in order to provide an emerging auxiliary diagnostic method for patients with thyroid tumors. The patients were selected from the Chongqing General Hospital (Chongqing, China) from July 2020 to September 2021. And peripheral blood, BRAFV600E gene, and demographic indicators were selected, including sex, age, BRAFV600E gene, lymphocyte count (Lymph#), neutrophil count (Neu#), neutrophil/lymphocyte ratio (NLR), platelet/lymphocyte ratio (PLR), red blood cell distribution width (RDW), platelets count (PLT), red blood cell distribution width-coefficient of variation (RDW-CV), alkaline phosphatase (ALP), and parathyroid hormone (PTH). First, feature selection was executed by univariate analysis combined with least absolute shrinkage and selection operator (LASSO) analysis. Afterward, we used machine learning algorithms to establish three types of models. The first model contains all predictors, the second model contains indicators after feature selection, and the third model contains patient peripheral blood indicators. The four machine learning algorithms include extreme gradient boosting (XGBoost), random forest (RF), light gradient boosting machine (LightGBM), and adaptive boosting (AdaBoost) which were used to build predictive models. A grid search algorithm was used to find the optimal parameters of the machine learning algorithms. A series of indicators, such as the area under the curve (AUC), were intended to determine the model performance. A total of 2,042 patients met the criteria and were enrolled in this study, and 12 variables were included. Sex, age, Lymph#, PLR, RDW, and BRAFV600E were identified as statistically significant indicators by univariate and LASSO analysis. Among the model we constructed, RF, XGBoost, LightGBM and AdaBoost with the AUC of 0.874 (95% CI, 0.841-0.906), 0.868 (95% CI, 0.834-0.901), 0.861 (95% CI, 0.826-0.895), and 0.837 (95% CI, 0.802-0.873) in the first model. With the AUC of 0.853 (95% CI, 0.818-0.888), 0.853 (95% CI, 0.818-0.889), 0.837 (95% CI, 0.800-0.873), and 0.832 (95% CI, 0.797-0.867) in the second model. With the AUC of 0.698 (95% CI, 0.651-0.745), 0.688 (95% CI, 0.639-0.736), 0.693 (95% CI, 0.645-0.741), and 0.666 (95% CI, 0.618-0.714) in the third model. Compared with the existing models, our study proposes a model incorporating novel biomarkers which could be a powerful and promising tool for predicting benign and malignant thyroid tumors.

Keywords: BRAFV600E gene mutation; machine learning; predictive model; risk-factors; thyroid tumor.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alkaline Phosphatase*
  • Humans
  • Machine Learning
  • Parathyroid Hormone
  • Retrospective Studies
  • Thyroid Neoplasms*

Substances

  • Parathyroid Hormone
  • Alkaline Phosphatase