Incorporation of a machine learning pathological diagnosis algorithm into the thyroid ultrasound imaging data improves the diagnosis risk of malignant thyroid nodules

Front Oncol. 2022 Dec 8:12:968784. doi: 10.3389/fonc.2022.968784. eCollection 2022.

Abstract

Objective: This study aimed at establishing a new model to predict malignant thyroid nodules using machine learning algorithms.

Methods: A retrospective study was performed on 274 patients with thyroid nodules who underwent fine-needle aspiration (FNA) cytology or surgery from October 2018 to 2020 in Xianyang Central Hospital. The least absolute shrinkage and selection operator (lasso) regression analysis and logistic analysis were applied to screen and identified variables. Six machine learning algorithms, including Decision Tree (DT), Extreme Gradient Boosting (XGBoost), Gradient Boosting Machine (GBM), Naive Bayes Classifier (NBC), Random Forest (RF), and Logistic Regression (LR), were employed and compared in constructing the predictive model, coupled with preoperative clinical characteristics and ultrasound features. Internal validation was performed by using 10-fold cross-validation. The performance of the model was measured by the area under the receiver operating characteristic curve (AUC), accuracy, precision, recall, F1 score, Shapley additive explanations (SHAP) plot, feature importance, and correlation of features. The best cutoff value for risk stratification was identified by probability density function (PDF) and clinical utility curve (CUC).

Results: The malignant rate of thyroid nodules in the study cohort was 53.2%. The predictive models are constructed by age, margin, shape, echogenic foci, echogenicity, and lymph nodes. The XGBoost model was significantly superior to any one of the machine learning models, with an AUC value of 0.829. According to the PDF and CUC, we recommended that 51% probability be used as a threshold for determining the risk stratification of malignant nodules, where about 85.6% of patients with malignant nodules could be detected. Meanwhile, approximately 89.8% of unnecessary biopsy procedures would be saved. Finally, an online web risk calculator has been built to estimate the personal likelihood of malignant thyroid nodules based on the best-performing ML-ed model of XGBoost.

Conclusions: Combining clinical characteristics and features of ultrasound images, ML algorithms can achieve reliable prediction of malignant thyroid nodules. The online web risk calculator based on the XGBoost model can easily identify in real-time the probability of malignant thyroid nodules, which can assist clinicians to formulate individualized management strategies for patients.

Keywords: machine learning; malignant; predictive model; thyroid nodules; web calculator.