Large language models: Are artificial intelligence-based chatbots a reliable source of patient information for spinal surgery?

Anna Stroop; Tabea Stroop; Samer Zawy Alsofy; Makoto Nakamura; Frank Möllmann; Christoph Greiner; Ralf Stroop

doi:10.1007/s00586-023-07975-z

Large language models: Are artificial intelligence-based chatbots a reliable source of patient information for spinal surgery?

Eur Spine J. 2023 Oct 11. doi: 10.1007/s00586-023-07975-z. Online ahead of print.

Authors

Anna Stroop¹, Tabea Stroop¹, Samer Zawy Alsofy^{1

2}, Makoto Nakamura³, Frank Möllmann⁴, Christoph Greiner⁴, Ralf Stroop^{5

6}

Affiliations

¹ Faculty of Health, Department of Medicine, Witten-Herdecke University, Alfred-Herrhausen-Straße 45, 58455, Witten, Germany.
² Department of Neurosurgery, St. Barbara-Hospital, Academic Hospital of Westfälische Wilhelms-University Münster, Hamm, Germany.
³ Department of Neurosurgery, Academic Hospital Köln-Merheim, Witten-Herdecke University, Cologne, Germany.
⁴ Department for Neuro- and Spine Surgery, Niels Stensen Neuro Center, Osnabrück, Germany.
⁵ Faculty of Health, Department of Medicine, Witten-Herdecke University, Alfred-Herrhausen-Straße 45, 58455, Witten, Germany. ralf.stroop@uni-wh.de.
⁶ Medical School Hamburg, Hamburg, Germany. ralf.stroop@uni-wh.de.

PMID: 37821602
DOI: 10.1007/s00586-023-07975-z

Abstract

Purpose: Large language models (LLM) have recently attracted attention because of their enormous performance. Based on artificial intelligence, LLM enable dialogic communication using quasi-natural language that approximates the quality of human communication. Thus, LLM could play an important role for patients to become informed. To evaluate the validity of an LLM in providing medical information, we used one of the first high-performance LLM (ChatGPT) on the clinical example of acute lumbar disc herniation (LDH).

Methods: Twenty-four spinal surgeons experienced in LDH surgery directed questions to ChatGPT about the clinical picture of LDH from a patient's perspective. They evaluated the quality of ChatGPT responses and its potential use in medical communication. The responses were compared with the information content of a standard informed consent form.

Results: ChatGPT provided good results in terms of comprehensibility, specificity, and satisfaction of responses and in terms of medical accuracy and completeness. ChatGPT was not able to provide all the information that was provided in the informed consent form, but did communicate information that was not listed there. In some cases, albeit minor, ChatGPT made medically inaccurate claims, such as listing kyphoplasty and vertebroplasty as surgical options for LDH.

Conclusion: With the incipient use of artificial intelligence in communication, LLM will certainly become increasingly important to patients. Even if LLM are unlikely to play a role in clinical communication between physicians and patients at the moment, the opportunities-but also the risks-of this novel technology should be alertly monitored.

Keywords: ChatGPT; Large language model; Patient information; Spinal surgery.