A retrieval-augmented chatbot based on GPT-4 provides appropriate differential diagnosis in gastrointestinal radiology: a proof of concept study

Stephan Rau; Alexander Rau; Johanna Nattenmüller; Anna Fink; Fabian Bamberg; Marco Reisert; Maximilian F Russe

doi:10.1186/s41747-024-00457-x

A retrieval-augmented chatbot based on GPT-4 provides appropriate differential diagnosis in gastrointestinal radiology: a proof of concept study

Eur Radiol Exp. 2024 May 17;8(1):60. doi: 10.1186/s41747-024-00457-x.

Authors

Stephan Rau¹, Alexander Rau^{2

3}, Johanna Nattenmüller², Anna Fink², Fabian Bamberg², Marco Reisert², Maximilian F Russe²

Affiliations

¹ Department of Diagnostic and Interventional Radiology, Faculty of Medicine, Medical Center - University of Freiburg, University of Freiburg, 79106, Freiburg Im Breisgau, Germany. stephan.rau@uniklinik-freiburg.de.
² Department of Diagnostic and Interventional Radiology, Faculty of Medicine, Medical Center - University of Freiburg, University of Freiburg, 79106, Freiburg Im Breisgau, Germany.
³ Department of Neuroradiology, Faculty of Medicine, Medical Center - University of Freiburg, University of Freiburg, Hugstetter Str. 55, 79106, Freiburg Im Breisgau, Germany.

Abstract

Background: We investigated the potential of an imaging-aware GPT-4-based chatbot in providing diagnoses based on imaging descriptions of abdominal pathologies.

Methods: Utilizing zero-shot learning via the LlamaIndex framework, GPT-4 was enhanced using the 96 documents from the Radiographics Top 10 Reading List on gastrointestinal imaging, creating a gastrointestinal imaging-aware chatbot (GIA-CB). To assess its diagnostic capability, 50 cases on a variety of abdominal pathologies were created, comprising radiological findings in fluoroscopy, MRI, and CT. We compared the GIA-CB to the generic GPT-4 chatbot (g-CB) in providing the primary and 2 additional differential diagnoses, using interpretations from senior-level radiologists as ground truth. The trustworthiness of the GIA-CB was evaluated by investigating the source documents as provided by the knowledge-retrieval mechanism. Mann-Whitney U test was employed.

Results: The GIA-CB demonstrated a high capability to identify the most appropriate differential diagnosis in 39/50 cases (78%), significantly surpassing the g-CB in 27/50 cases (54%) (p = 0.006). Notably, the GIA-CB offered the primary differential in the top 3 differential diagnoses in 45/50 cases (90%) versus g-CB with 37/50 cases (74%) (p = 0.022) and always with appropriate explanations. The median response time was 29.8 s for GIA-CB and 15.7 s for g-CB, and the mean cost per case was $0.15 and $0.02, respectively.

Conclusions: The GIA-CB not only provided an accurate diagnosis for gastrointestinal pathologies, but also direct access to source documents, providing insight into the decision-making process, a step towards trustworthy and explainable AI. Integrating context-specific data into AI models can support evidence-based clinical decision-making.

Relevance statement: A context-aware GPT-4 chatbot demonstrates high accuracy in providing differential diagnoses based on imaging descriptions, surpassing the generic GPT-4. It provided formulated rationale and source excerpts supporting the diagnoses, thus enhancing trustworthy decision-support.

Key points: • Knowledge retrieval enhances differential diagnoses in a gastrointestinal imaging-aware chatbot (GIA-CB). • GIA-CB outperformed the generic counterpart, providing formulated rationale and source excerpts. • GIA-CB has the potential to pave the way for AI-assisted decision support systems.

Keywords: Artificial intelligence; Diagnosis (differential); Gastrointestinal diseases; Knowledge acquisition (computer); Zero-shot learning.

MeSH terms

Diagnosis, Differential
Gastrointestinal Diseases / diagnostic imaging
Humans
Proof of Concept Study*