"ChatGPT, Can You Help Me Save My Child's Life?" - Diagnostic Accuracy and Supportive Capabilities to Lay Rescuers by ChatGPT in Prehospital Basic Life Support and Paediatric Advanced Life Support Cases - An In-silico Analysis

Stefan Bushuven; Michael Bentele; Stefanie Bentele; Bianka Gerber; Joachim Bansbach; Julian Ganter; Milena Trifunovic-Koenig; Robert Ranisch

doi:10.1007/s10916-023-02019-x

"ChatGPT, Can You Help Me Save My Child's Life?" - Diagnostic Accuracy and Supportive Capabilities to Lay Rescuers by ChatGPT in Prehospital Basic Life Support and Paediatric Advanced Life Support Cases - An In-silico Analysis

J Med Syst. 2023 Nov 21;47(1):123. doi: 10.1007/s10916-023-02019-x.

Authors

Stefan Bushuven^{1

2

3}, Michael Bentele⁴, Stefanie Bentele⁴, Bianka Gerber⁴, Joachim Bansbach⁵, Julian Ganter⁵, Milena Trifunovic-Koenig⁴, Robert Ranisch⁶

Affiliations

¹ Training Center for Emergency Medicine (NOTIS e.V), Breite Strasse 7, Engen, 78234, Germany. Stefan.Bushuven@notis-ev.de.
² Department of Anesthesiology and Critical Care, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany. Stefan.Bushuven@notis-ev.de.
³ Institute for Medical Education, University Hospital, LMU Munich, Munich, Germany. Stefan.Bushuven@notis-ev.de.
⁴ Training Center for Emergency Medicine (NOTIS e.V), Breite Strasse 7, Engen, 78234, Germany.
⁵ Department of Anesthesiology and Critical Care, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany.
⁶ Faculty for Health Sciences Brandenburg, University of Potsdam, Potsdam, Germany.

Abstract

Background: Paediatric emergencies are challenging for healthcare workers, first aiders, and parents waiting for emergency medical services to arrive. With the expected rise of virtual assistants, people will likely seek help from such digital AI tools, especially in regions lacking emergency medical services. Large Language Models like ChatGPT proved effective in providing health-related information and are competent in medical exams but are questioned regarding patient safety. Currently, there is no information on ChatGPT's performance in supporting parents in paediatric emergencies requiring help from emergency medical services. This study aimed to test 20 paediatric and two basic life support case vignettes for ChatGPT and GPT-4 performance and safety in children.

Methods: We provided the cases three times each to two models, ChatGPT and GPT-4, and assessed the diagnostic accuracy, emergency call advice, and the validity of advice given to parents.

Results: Both models recognized the emergency in the cases, except for septic shock and pulmonary embolism, and identified the correct diagnosis in 94%. However, ChatGPT/GPT-4 reliably advised to call emergency services only in 12 of 22 cases (54%), gave correct first aid instructions in 9 cases (45%) and incorrectly advised advanced life support techniques to parents in 3 of 22 cases (13.6%).

Conclusion: Considering these results of the recent ChatGPT versions, the validity, reliability and thus safety of ChatGPT/GPT-4 as an emergency support tool is questionable. However, whether humans would perform better in the same situation is uncertain. Moreover, other studies have shown that human emergency call operators are also inaccurate, partly with worse performance than ChatGPT/GPT-4 in our study. However, one of the main limitations of the study is that we used prototypical cases, and the management may differ from urban to rural areas and between different countries, indicating the need for further evaluation of the context sensitivity and adaptability of the model. Nevertheless, ChatGPT and the new versions under development may be promising tools for assisting lay first responders, operators, and professionals in diagnosing a paediatric emergency.

Trial registration: Not applicable.

Keywords: Artificial intelligence; ChatGPT; First responder; GPT-4; Large language model; Medical didactics; Tele-medicine.

MeSH terms

Child
Emergencies*
Emergency Medical Services*
Health Personnel
Humans
Language
Reproducibility of Results