Assessing AI-Powered Patient Education: A Case Study in Radiology

Ian J Kuckelman; Paul H Yi; Molinna Bui; Ifeanyi Onuh; Jade A Anderson; Andrew B Ross

doi:10.1016/j.acra.2023.08.020

Assessing AI-Powered Patient Education: A Case Study in Radiology

Acad Radiol. 2024 Jan;31(1):338-342. doi: 10.1016/j.acra.2023.08.020. Epub 2023 Sep 14.

Authors

Ian J Kuckelman¹, Paul H Yi², Molinna Bui³, Ifeanyi Onuh³, Jade A Anderson³, Andrew B Ross³

Affiliations

¹ University of Wisconsin School of Medicine and Public Health, 750 Highland Ave, Madison, WI 53705. Electronic address: kuckelman@wisc.edu.
² University of Maryland School of Medicine, Baltimore, Maryland.
³ University of Wisconsin School of Medicine and Public Health, 750 Highland Ave, Madison, WI 53705.

PMID: 37709612
DOI: 10.1016/j.acra.2023.08.020

Abstract

Rationale and objectives: With recent advancements in the power and accessibility of artificial intelligence (AI) Large Language Models (LLMs) patients might increasingly turn to these platforms to answer questions regarding radiologic examinations and procedures, despite valid concerns about the accuracy of information provided. This study aimed to assess the accuracy and completeness of information provided by the Bing Chatbot-a LLM powered by ChatGPT-on patient education for common radiologic exams.

Materials and methods: We selected three common radiologic examinations and procedures: computed tomography (CT) abdomen, magnetic resonance imaging (MRI) spine, and bone biopsy. For each, ten questions were tested on the chatbot in two trials using three different chatbot settings. Two reviewers independently assessed the chatbot's responses for accuracy and completeness compared to an accepted online resource, radiologyinfo.org.

Results: Of the 360 reviews performed, 336 (93%) were rated "entirely correct" and 24 (7%) were "mostly correct," indicating a high level of reliability. Completeness ratings showed that 65% were "complete" and 35% were "mostly complete." The "More Creative" chatbot setting produced a higher proportion of responses rated "entirely correct" but there were otherwise no significant difference in ratings based on chatbot settings or exam types. The readability level was rated eighth-grade level.

Conclusion: The Bing Chatbot provided accurate responses answering all or most aspects of the question asked of it, with responses tending to err on the side of caution for nuanced questions. Importantly, no responses were inaccurate or had potential to cause harm or confusion for the user. Thus, LLM chatbots demonstrate potential to enhance patient education in radiology and could be integrated into patient portals for various purposes, including exam preparation and results interpretation.

Keywords: Artificial intelligence; Bing Chatbot; Large language models; Patient education.

MeSH terms

Artificial Intelligence*
Humans
Patient Education as Topic
Radiography
Radiology*
Reproducibility of Results