Does ChatGPT Answer Otolaryngology Questions Accurately?

Matthew Maksimoski; Anisha Rhea Noble; David F Smith

doi:10.1002/lary.31410

Does ChatGPT Answer Otolaryngology Questions Accurately?

Laryngoscope. 2024 Mar 28. doi: 10.1002/lary.31410. Online ahead of print.

Authors

Matthew Maksimoski^{1

2}, Anisha Rhea Noble^{1

2}, David F Smith^{1

2

3}

Affiliations

¹ Division of Pediatric Otolaryngology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, U.S.A.
² Department of Otolaryngology - Head and Neck Surgery, University of Cincinnati, 231 Albert Sabin Way, Cincinnati, USA.
³ Division of Sleep and Circadian Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, U.S.A.

PMID: 38545679
DOI: 10.1002/lary.31410

Abstract

Objective: Investigate the accuracy of ChatGPT in the manner of medical questions related to otolaryngology.

Methods: A ChatGPT session was opened within which 93 questions were asked related to otolaryngology topics. Questions were drawn from all major domains within otolaryngology and based upon key action statements (KAS) from clinical practice guidelines (CPGs). Twenty-one "patient-level" questions were also asked of the program. Answers were graded as either "correct," "partially correct," "incorrect," or "non-answer."

Results: Correct answers were given at a rate of 45.5% (71.4% correct in patient-level, 37.3% CPG); partially correct answers at 31.8% (28.6% patient-level, 32.8% CPG); incorrect at 21.6% (0% patient-level, 28.4% CPG); and 1.1% non-answers (% patient-level, 1.5% CPG). There was no difference in the rate of correct answers between CPGs published before or after the period of data collection cited by ChatGPT. CPG-based questions were less likely to be correct than patient-level questions (p = 0.003).

Conclusion: Publicly available artificial intelligence software has become increasingly popular with consumers for everything from story-telling to data collection. In this study, we examined the accuracy of ChatGPT responses to questions related to otolaryngology over 7 domains and 21 published CPGs. Physicians and patients should understand the limitations of this software as it applies to otolaryngology, and programmers in future iterations should consider giving greater weight to information published by well-established journals and written by national content experts.

Level of evidence: N/A Laryngoscope, 2024.

Keywords: computers; innovation; machine learning; patient advocacy.