Diagnostic capabilities of ChatGPT in ophthalmology

Asaf Shemer; Michal Cohen; Aya Altarescu; Maya Atar-Vardi; Idan Hecht; Biana Dubinsky-Pertzov; Nadav Shoshany; Sigal Zmujack; Lior Or; Adi Einan-Lifshitz; Eran Pras

doi:10.1007/s00417-023-06363-z

Diagnostic capabilities of ChatGPT in ophthalmology

Graefes Arch Clin Exp Ophthalmol. 2024 Jan 6. doi: 10.1007/s00417-023-06363-z. Online ahead of print.

Authors

Asaf Shemer^{1

2}, Michal Cohen^{3

4}, Aya Altarescu^{3

5}, Maya Atar-Vardi^{3

5}, Idan Hecht^{3

5}, Biana Dubinsky-Pertzov^{3

5}, Nadav Shoshany^{3

5}, Sigal Zmujack^{3

5}, Lior Or^{3

5}, Adi Einan-Lifshitz^{3

5}, Eran Pras^{3

5

6}

Affiliations

¹ Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel. ShemerAsafMD@gmail.com.
² Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel. ShemerAsafMD@gmail.com.
³ Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel.
⁴ Faculty of Health Science, Ben-Gurion University of the Negev, South District, Beer-Sheva, Israel.
⁵ Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
⁶ The Matlow's Ophthalmo-Genetics Laboratory, Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel.

PMID: 38183467
DOI: 10.1007/s00417-023-06363-z

Abstract

Purpose: The purpose of this study is to assess the diagnostic accuracy of ChatGPT in the field of ophthalmology.

Methods: This is a retrospective cohort study conducted in one academic tertiary medical center. We reviewed data of patients admitted to the ophthalmology department from 06/2022 to 01/2023. We then created two clinical cases for each patient. The first case is according to the medical history alone (Hx). The second case includes an addition of the clinical examination (Hx and Ex). For each case, we asked for the three most likely diagnoses from ChatGPT, residents, and attendings. Then, we compared the accuracy rates (at least one correct diagnosis) of all groups. Additionally, we evaluated the total duration for completing the assignment between the groups.

Results: ChatGPT, residents, and attendings evaluated 126 cases from 63 patients (history only or history and exam findings for each patient). ChatGPT achieved a significantly lower accurate diagnosis rate (54%) in the Hx, as compared to the residents (75%; p < 0.01) and attendings (71%; p < 0.01). After adding the clinical examination findings, the diagnosis rate of ChatGPT was 68%, whereas for the residents and the attendings, it increased to 94% (p < 0.01) and 86% (p < 0.01), respectively. ChatGPT was 4 to 5 times faster than the attendings and residents.

Conclusions and relevance: ChatGPT showed low diagnostic rates in ophthalmology cases compared to residents and attendings based on patient history alone or with additional clinical examination findings. However, ChatGPT completed the task faster than the physicians.

Keywords: Artificial intelligence; ChatGPT; Diagnosis; Ophthalmology; Residents.