Performance of an Artificial Intelligence Chatbot in Ophthalmic Knowledge Assessment

Andrew Mihalache; Marko M Popovic; Rajeev H Muni

doi:10.1001/jamaophthalmol.2023.1144

Performance of an Artificial Intelligence Chatbot in Ophthalmic Knowledge Assessment

JAMA Ophthalmol. 2023 Jun 1;141(6):589-597. doi: 10.1001/jamaophthalmol.2023.1144.

Authors

Andrew Mihalache¹, Marko M Popovic², Rajeev H Muni^{2

3}

Affiliations

¹ Schulich School of Medicine and Dentistry, University of Western Ontario, London, Ontario, Canada.
² Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada.
³ Department of Ophthalmology, St Michael's Hospital/Unity Health Toronto, Toronto, Ontario, Canada.

Abstract

Importance: ChatGPT is an artificial intelligence (AI) chatbot that has significant societal implications. Training curricula using AI are being developed in medicine, and the performance of chatbots in ophthalmology has not been characterized.

Objective: To assess the performance of ChatGPT in answering practice questions for board certification in ophthalmology.

Design, setting, and participants: This cross-sectional study used a consecutive sample of text-based multiple-choice questions provided by the OphthoQuestions practice question bank for board certification examination preparation. Of 166 available multiple-choice questions, 125 (75%) were text-based.

Exposures: ChatGPT answered questions from January 9 to 16, 2023, and on February 17, 2023.

Main outcomes and measures: Our primary outcome was the number of board certification examination practice questions that ChatGPT answered correctly. Our secondary outcomes were the proportion of questions for which ChatGPT provided additional explanations, the mean length of questions and responses provided by ChatGPT, the performance of ChatGPT in answering questions without multiple-choice options, and changes in performance over time.

Results: In January 2023, ChatGPT correctly answered 58 of 125 questions (46%). ChatGPT's performance was the best in the category general medicine (11/14; 79%) and poorest in retina and vitreous (0%). The proportion of questions for which ChatGPT provided additional explanations was similar between questions answered correctly and incorrectly (difference, 5.82%; 95% CI, -11.0% to 22.0%; χ21 = 0.45; P = .51). The mean length of questions was similar between questions answered correctly and incorrectly (difference, 21.4 characters; SE, 36.8; 95% CI, -51.4 to 94.3; t = 0.58; df = 123; P = .22). The mean length of responses was similar between questions answered correctly and incorrectly (difference, -80.0 characters; SE, 65.4; 95% CI, -209.5 to 49.5; t = -1.22; df = 123; P = .22). ChatGPT selected the same multiple-choice response as the most common answer provided by ophthalmology trainees on OphthoQuestions 44% of the time. In February 2023, ChatGPT provided a correct response to 73 of 125 multiple-choice questions (58%) and 42 of 78 stand-alone questions (54%) without multiple-choice options.

Conclusions and relevance: ChatGPT answered approximately half of questions correctly in the OphthoQuestions free trial for ophthalmic board certification preparation. Medical professionals and trainees should appreciate the advances of AI in medicine while acknowledging that ChatGPT as used in this investigation did not answer sufficient multiple-choice questions correctly for it to provide substantial assistance in preparing for board certification at this time.

MeSH terms

Artificial Intelligence*
Cross-Sectional Studies
Curriculum
Humans
Ophthalmology*
Retina