Why do users override alerts? Utilizing large language model to summarize comments and optimize clinical decision support

Siru Liu; Allison B McCoy; Aileen P Wright; Scott D Nelson; Sean S Huang; Hasan B Ahmad; Sabrina E Carro; Jacob Franklin; James Brogan; Adam Wright

doi:10.1093/jamia/ocae041

Why do users override alerts? Utilizing large language model to summarize comments and optimize clinical decision support

J Am Med Inform Assoc. 2024 Mar 7:ocae041. doi: 10.1093/jamia/ocae041. Online ahead of print.

Authors

Siru Liu^{1

2}, Allison B McCoy¹, Aileen P Wright^{1

3}, Scott D Nelson¹, Sean S Huang^{1

3}, Hasan B Ahmad⁴, Sabrina E Carro⁵, Jacob Franklin³, James Brogan³, Adam Wright^{1

3}

Affiliations

¹ Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37212, United States.
² Department of Computer Science, Vanderbilt University, Nashville, TN 37212, United States.
³ Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37212, United States.
⁴ Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA 98195, United States.
⁵ Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN 37212, United States.

PMID: 38452289
DOI: 10.1093/jamia/ocae041

Abstract

Objectives: To evaluate the capability of using generative artificial intelligence (AI) in summarizing alert comments and to determine if the AI-generated summary could be used to improve clinical decision support (CDS) alerts.

Materials and methods: We extracted user comments to alerts generated from September 1, 2022 to September 1, 2023 at Vanderbilt University Medical Center. For a subset of 8 alerts, comment summaries were generated independently by 2 physicians and then separately by GPT-4. We surveyed 5 CDS experts to rate the human-generated and AI-generated summaries on a scale from 1 (strongly disagree) to 5 (strongly agree) for the 4 metrics: clarity, completeness, accuracy, and usefulness.

Results: Five CDS experts participated in the survey. A total of 16 human-generated summaries and 8 AI-generated summaries were assessed. Among the top 8 rated summaries, five were generated by GPT-4. AI-generated summaries demonstrated high levels of clarity, accuracy, and usefulness, similar to the human-generated summaries. Moreover, AI-generated summaries exhibited significantly higher completeness and usefulness compared to the human-generated summaries (AI: 3.4 ± 1.2, human: 2.7 ± 1.2, P = .001).

Conclusion: End-user comments provide clinicians' immediate feedback to CDS alerts and can serve as a direct and valuable data resource for improving CDS delivery. Traditionally, these comments may not be considered in the CDS review process due to their unstructured nature, large volume, and the presence of redundant or irrelevant content. Our study demonstrates that GPT-4 is capable of distilling these comments into summaries characterized by high clarity, accuracy, and completeness. AI-generated summaries are equivalent and potentially better than human-generated summaries. These AI-generated summaries could provide CDS experts with a novel means of reviewing user comments to rapidly optimize CDS alerts both online and offline.

Keywords: alert fatigue; clinical decision support; health personnel; large language model.

Grants and funding

R00LM014097-02/GF/NIH HHS/United States