A Context-based Chatbot Surpasses Trained Radiologists and Generic ChatGPT in Following the ACR Appropriateness Guidelines

Alexander Rau; Stephan Rau; Daniela Zoeller; Anna Fink; Hien Tran; Caroline Wilpert; Johanna Nattenmueller; Jakob Neubauer; Fabian Bamberg; Marco Reisert; Maximilian F Russe

doi:10.1148/radiol.230970

A Context-based Chatbot Surpasses Trained Radiologists and Generic ChatGPT in Following the ACR Appropriateness Guidelines

Radiology. 2023 Jul;308(1):e230970. doi: 10.1148/radiol.230970.

Authors

Alexander Rau^{1

2}, Stephan Rau¹, Daniela Zoeller^{3

4}, Anna Fink¹, Hien Tran¹, Caroline Wilpert¹, Johanna Nattenmueller¹, Jakob Neubauer¹, Fabian Bamberg¹, Marco Reisert^{5

6}, Maximilian F Russe¹

Affiliations

¹ Department of Diagnostic and Interventional Radiology, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, 79106 Freiburg, Germany.
² Department of Neuroradiology, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, 79106 Freiburg, Germany.
³ Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Germany.
⁴ Freiburg Center for Data Analysis and Modelling, University of Freiburg, Germany.
⁵ Medical Physics, Department of Diagnostic and Interventional Radiology, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, 79106 Freiburg, Germany.
⁶ Department of Stereotactic and Functional Neurosurgery, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, 79106 Freiburg, Germany.

PMID: 37489981
DOI: 10.1148/radiol.230970

Abstract

Background Radiological imaging guidelines are crucial for accurate diagnosis and optimal patient care as they result in standardized decisions and thus reduce inappropriate imaging studies. Purpose In the present study, we investigated the potential to support clinical decision-making using an interactive chatbot designed to provide personalized imaging recommendations from American College of Radiology (ACR) appropriateness criteria documents using semantic similarity processing. Methods We utilized 209 ACR appropriateness criteria documents as specialized knowledge base and employed LlamaIndex, a framework that allows to connect large language models with external data, and the ChatGPT 3.5-Turbo to create an appropriateness criteria contexted chatbot (accGPT). Fifty clinical case files were used to compare the accGPT's performance against general radiologists at varying experience levels and to generic ChatGPT 3.5 and 4.0. Results All chatbots reached at least human performance level. For the 50 case files, the accGPT performed best in providing correct recommendations that were "usually appropriate" according to the ACR criteria and also did provide the highest proportion of consistently correct answers in comparison with generic chatbots and radiologists. Further, the chatbots provided substantial time and cost savings, with an average decision time of 5 minutes and a cost of 0.19 € for all cases, compared to 50 minutes and 29.99 € for radiologists (both p < 0.01). Conclusion ChatGPT-based algorithms have the potential to substantially improve the decision-making for clinical imaging studies in accordance with ACR guidelines. Specifically, a context-based algorithm performed superior to its generic counterpart, demonstrating the value of tailoring AI solutions to specific healthcare applications.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Clinical Decision-Making
Cost Savings
Humans
Radiologists
Software*