Assessment of the statistical optimization strategies and clinical evaluation of an artificial intelligence-based automated diagnostic system for thyroid nodule screening

Quant Imaging Med Surg. 2023 Feb 1;13(2):695-706. doi: 10.21037/qims-22-85. Epub 2023 Jan 2.

Abstract

Background: Thyroid cancer is the most common endocrine cancer in the world. Accurately distinguishing between benign and malignant thyroid nodules is particularly important for the early diagnosis and treatment of thyroid cancer. This study aimed to investigate the best possible optimization strategies for an already-trained artificial intelligence (AI)-based automated diagnostic system for thyroid nodule screening and, in addition, to scrutinize the clinically relevant limitations using stratified analysis to better standardize the application in clinical workflows.

Methods: We retrospectively reviewed a total of 1,092 ultrasound images associated with 397 thyroid nodules collected from 287 patients between April 2019 and January 2021, applying postoperative pathology as the gold standard. We applied different statistical approaches, including averages, maximums, and percentiles, to estimate per-nodule-based malignancy scores from the malignancy scores per image predicted by AI-SONIC Thyroid v. 5.3.0.2 (Demetics Medical Technology Ltd., Hangzhou, China) system, and we assessed its diagnostic efficacy on nodules of different sizes or tumor types with per-nodule analysis using performance metrics.

Results: Of the 397 thyroid nodules, 272 thyroid nodules were overrepresented by malignant nodules according to the results of the surgical pathological examinations. Taking the median of the malignancy scores per image to estimate the nodule-based score with a cutoff value of 0.56 optimized for the means of sensitivity and specificity, the AI-based automated detection system demonstrated slightly lower sensitivity, significantly higher specificity (almost independent of nodule size), and similar accuracy to that of the senior radiologist. Both the AI system and the senior radiologist demonstrated higher sensitivity in diagnosing smaller nodules (≤25 mm) and comparable diagnostic performances for larger nodules. The mean diagnostic time per nodule of the AI system was 0.146 s, which was in sharp contrast to the 2.8 to 4.5 min of the radiologists.

Conclusions: Using our optimization strategy to achieve nodule-based diagnosis, the AI-SONIC Thyroid automated diagnostic system demonstrated an overall diagnostic accuracy equivalent to that of senior radiologists. Thus, it is expected that it can be used as a reliable auxiliary diagnostic method by radiologists for the screening and preoperative evaluation of malignant thyroid nodules.

Keywords: AI automated diagnostic system; Thyroid nodule; artificial intelligence (AI); radiologists.