Risk stratification of thyroid nodules: Assessing the suitability of ChatGPT for text-based analysis

Am J Otolaryngol. 2024 Mar-Apr;45(2):104144. doi: 10.1016/j.amjoto.2023.104144. Epub 2023 Dec 7.

Abstract

Purpose: Accurate risk stratification of thyroid nodules is essential for optimal patient management. This study aimed to assess the suitability of ChatGPT for risk stratification of thyroid nodules using a text-based evaluation.

Methods: A dataset was compiled comprising 50 anonymized clinical reports and associated risk assessments for thyroid nodules. The Chat Generative Pre-trained Transformer (ChatGPT) was used to classify sonographic patterns in accordance with the Thyroid Imaging Reporting and Data System (TI-RADS). The model's performance was assessed using various criteria, including sensitivity, specificity, and accuracy. A comparative analysis was conducted, evaluating the model against investigator-based risk stratification as well as histology.

Results: With an overall agreement rate of 42 % in comparison with examiner-based evaluation (TI-RADS 1-5), the results show that ChatGPT has moderate potential for predicting the risk of malignancy in thyroid nodules using text-based reports. The chatbot model achieved a sensitivity of 86.7 %, a specificity of 10.7 %, and an overall accuracy of 68 % when distinguishing between low-risk (TI-RADS 2 and 3) and high-risk (TI-RADS 4 and 5) categories. Interrater reliability was calculated with a Cohen's kappa of 0.686.

Conclusion: This study highlights the potential of ChatGPT in assisting clinicians with risk stratification of thyroid nodules. The results suggest that ChatGPT can facilitate personalized treatment decisions, although the agreement rate is still low. Further research and validation studies are necessary to establish the clinical applicability and generalizability of ChatGPT in routine practice. The integration of ChatGPT into clinical workflows has the potential to enhance thyroid nodule risk assessment and improve patient care.

Keywords: AI; ChatGPT; Risk stratification; Thyroid nodules; Ultrasound.

MeSH terms

  • Humans
  • Reproducibility of Results
  • Retrospective Studies
  • Risk Assessment
  • Thyroid Nodule* / diagnostic imaging
  • Thyroid Nodule* / pathology
  • Ultrasonography / methods