Performance of ChatGPT in Board Examinations for Specialists in the Japanese Ophthalmology Society

Daiki Sakai; Tadao Maeda; Atsuta Ozaki; Genki N Kanda; Yasuo Kurimoto; Masayo Takahashi

doi:10.7759/cureus.49903

Performance of ChatGPT in Board Examinations for Specialists in the Japanese Ophthalmology Society

Cureus. 2023 Dec 4;15(12):e49903. doi: 10.7759/cureus.49903. eCollection 2023 Dec.

Authors

Daiki Sakai^{1

2

3}, Tadao Maeda¹, Atsuta Ozaki^{1

4}, Genki N Kanda^{1

5}, Yasuo Kurimoto^{1

2}, Masayo Takahashi¹

Affiliations

¹ Department of Ophthalmology, Kobe City Eye Hospital, Kobe, JPN.
² Department of Ophthalmology, Kobe City Medical Center General Hospital, Kobe, JPN.
³ Department of Surgery, Division of Ophthalmology, Kobe University Graduate School of Medicine, Kobe, JPN.
⁴ Department of Ophthalmology, Mie University Graduate School of Medicine, Tsu, JPN.
⁵ Laboratory for Biologically Inspired Computing, RIKEN Center for Biosystems Dynamics Research, Kobe, JPN.

Abstract

We investigated the potential of ChatGPT in the ophthalmological field in the Japanese language using board examinations for specialists in the Japanese Ophthalmology Society. We tested GPT-3.5 and GPT-4-based ChatGPT on five sets of past board examination problems in July 2023. Japanese text was used as the prompt adopting two strategies: zero- and few-shot prompting. We compared the correct answer rate of ChatGPT with that of actual examinees, and the performance characteristics in 10 subspecialties were assessed. ChatGPT-3.5 and ChatGPT-4 correctly answered 112 (22.4%) and 229 (45.8%) out of 500 questions with simple zero-shot prompting, respectively, and ChatGPT-4 correctly answered 231 (46.2%) questions with few-shot prompting. The correct answer rates of ChatGPT-3.5 were approximately two to three times lower than those of the actual examinees for each examination set (p = 0.001). However, the correct answer rates for ChatGPT-4 were close to approximately 70% of those of the examinees. ChatGPT-4 had the highest correct answer rate (71.4% with zero-shot prompting and 61.9% with few-shot prompting) in "blepharoplasty, orbit, and ocular oncology," and the lowest answer rate (30.0% with zero-shot prompting and 23.3% with few-shot prompting) in "pediatric ophthalmology." We concluded that ChatGPT could be one of the advanced technologies for practical tools in Japanese ophthalmology.

Keywords: artificial intelligence; board examination; chatgpt; generative artificial intelligence; large language models; ophthalmology.