Evaluating ChatGPT responses on obstructive sleep apnea for patient education

Daniel J Campbell; Leonard E Estephan; Eric V Mastrolonardo; Dev R Amin; Colin T Huntley; Maurits S Boon

doi:10.5664/jcsm.10728

Evaluating ChatGPT responses on obstructive sleep apnea for patient education

J Clin Sleep Med. 2023 Dec 1;19(12):1989-1995. doi: 10.5664/jcsm.10728.

Authors

Daniel J Campbell¹, Leonard E Estephan¹, Eric V Mastrolonardo¹, Dev R Amin¹, Colin T Huntley¹, Maurits S Boon¹

Affiliation

¹ Department of Otolaryngology - Head and Neck Surgery, Thomas Jefferson University Hospitals, Philadelphia, Pennsylvania.

PMID: 37485676
PMCID: PMC10692937 (available on 2024-12-01)
DOI: 10.5664/jcsm.10728

Abstract

Study objectives: We evaluated the quality of ChatGPT responses to questions on obstructive sleep apnea for patient education and assessed how prompting the chatbot influences correctness, estimated grade level, and references of answers.

Methods: ChatGPT was queried 4 times with 24 identical questions. Queries differed by initial prompting: no prompting, patient-friendly prompting, physician-level prompting, and prompting for statistics/references. Answers were scored on a hierarchical scale: incorrect, partially correct, correct, correct with either statistic or referenced citation ("correct+"), or correct with both a statistic and citation ("perfect"). Flesch-Kincaid grade level and citation publication years were recorded for answers. Proportions of responses at incremental score thresholds were compared by prompt type using chi-squared analysis. The relationship between prompt type and grade level was assessed using analysis of variance.

Results: Across all prompts (n = 96 questions), 69 answers (71.9%) were at least correct. Proportions of responses that were at least partially correct (P = .387) or correct (P = .453) did not differ by prompt; responses that were at least correct+ (P < .001) or perfect (P < .001) did. Statistics/references prompting provided 74/77 (96.1%) references. Responses from patient-friendly prompting had a lower mean grade level (12.45 ± 2.32) than no prompting (14.15 ± 1.59), physician-level prompting (14.27 ± 2.09), and statistics/references prompting (15.00 ± 2.26) (P < .0001).

Conclusions: ChatGPT overall provides appropriate answers to most questions on obstructive sleep apnea regardless of prompting. While prompting decreases response grade level, all responses remained above accepted recommendations for presenting medical information to patients. Given ChatGPT's rapid implementation, sleep experts may seek to further scrutinize its medical literacy and utility for patients.

Citation: Campbell DJ, Estephan LE, Mastrolonardo EV, Amin DR, Huntley CT, Boon MS. Evaluating ChatGPT responses on obstructive sleep apnea for patient education. J Clin Sleep Med. 2023;19(12):1989-1995.

Keywords: artificial intelligence; obstructive sleep apnea; patient education; sleep surgery.

MeSH terms

Humans
Patient Education as Topic
Physicians*
Sleep
Sleep Apnea, Obstructive* / therapy
Software