The utility and accuracy of ChatGPT in providing post-operative instructions following tonsillectomy: A pilot study

Sarit Dhar; Dhruv Kothari; Missael Vasquez; Travis Clarke; Andrew Maroda; Wade G McClain; Anthony Sheyn; Robert M Tuliszewski; Dennis M Tang; Sanjeet V Rangarajan

doi:10.1016/j.ijporl.2024.111901

The utility and accuracy of ChatGPT in providing post-operative instructions following tonsillectomy: A pilot study

Int J Pediatr Otorhinolaryngol. 2024 Apr:179:111901. doi: 10.1016/j.ijporl.2024.111901. Epub 2024 Feb 29.

Affiliations

¹ Department of Otolaryngology Head & Neck Surgery, University of Tennessee Health Science Center, 910 Madison Ave, Memphis, TN, 38163, USA.
² Department of Otolaryngology Head & Neck Surgery, University of Tennessee Health Science Center, 910 Madison Ave, Memphis, TN, 38163, USA; Department of Otolaryngology Head & Neck Surgery, Cedars-Sinai Medical Center, 8700 Beverly Blvd, Los Angeles, CA, 90048, USA.
³ Department of Otolaryngology Head & Neck Surgery, Cedars-Sinai Medical Center, 8700 Beverly Blvd, Los Angeles, CA, 90048, USA.
⁴ Department of Otolaryngology-Head and Neck Surgery, University Hospitals Cleveland Medical Center, Case Western Reserve University School of Medicine, 11100 Euclid Ave, Cleveland, OH, 44106, USA. Electronic address: sanjeet.rangarajan@uhhospitals.org.

PMID: 38447265
DOI: 10.1016/j.ijporl.2024.111901

Abstract

Objective: To investigate the utility of answers generated by ChatGPT, a large language model, to common questions parents have for their children following tonsillectomy.

Methods: Twenty Otolaryngology residents anonymously submitted common questions asked by parents of pediatric patients following tonsillectomy. After identifying the 16 most common questions via consensus-based approach, we asked ChatGPT to generate responses to these queries. Satisfaction with the AI-generated answers was rated from 1 (Worst) to 5 (Best) by an expert panel of 3 pediatric Otolaryngologists.

Results: The distribution of questions across the five most common domains, their mean satisfaction scores, and their Krippendorf's interrater reliability coefficient were: Pain management [6, (3.67), (0.434)], Complications [4, (3.58), (-0.267)], Diet [3, (4.33), (-0.357)], Physical Activity [2, (4.33), (-0.318)], and Follow-up [1, (2.67), (-0.250)]. The panel noted that answers for diet, bleeding complications, and return to school were thorough. Pain management and follow-up recommendations were inaccurate, including a recommendation to prescribe codeine to children despite a black-box warning, and a suggested post-operative follow-up at 1 week, rather than the customary 2-4 weeks for our panel.

Conclusion: Although ChatGPT can provide accurate answers for common patient questions following tonsillectomy, it sometimes provides eloquently written inaccurate information. This may lead to patients using AI-generated medical advice contrary to physician advice. The inaccuracy in pain management answers likely reflects regional practice variability. If trained appropriately, ChatGPT could be an excellent resource for Otolaryngologists and patients to answer questions in the postoperative period. Future research should investigate if Otolaryngologist-trained models can increase the accuracy of responses.

Keywords: Artificial intelligence; ChatGPT; Machine learning; Otolaryngology; Post-operative instructions; Tonsillectomy.

MeSH terms

Child
Consensus
Humans
Pilot Projects
Postoperative Period
Reproducibility of Results
Tonsillectomy* / adverse effects