Can ChatGPT pass the thoracic surgery exam?

Am J Med Sci. 2023 Oct;366(4):291-295. doi: 10.1016/j.amjms.2023.08.001. Epub 2023 Aug 6.

Abstract

Background: The capacity of ChatGPT in academic environments and medical exams is being discovered more and more every day. In this study, we tested the success of ChatGPT on Turkish-language thoracic surgery exam questions.

Methods: ChatGPT was provided with a total of 105 questions divided into seven distinct groups, each of which contained 15 questions. Along with the success of the students, the success of ChatGPT-3.5 and ChatGPT-4 architectures in answering the questions correctly was analyzed.

Results: The overall mean score of students was 12.50 ± 1.20, corresponding to 83.33%. Moreover, ChatGPT-3.5 managed to surpass students' score of 12.5 with an average of 13.57 ± 0.49 questions correctly on average, while ChatGPT-4 answered 14 ± 0.76 questions correctly (83.3%, 90.48%, and 93.33%, respectively).

Conclusions: When the results of this study and other similar studies in the literature are evaluated together, ChatGPT, which was developed for general purpose, can also produce successful results in a specific field of medicine. AI-powered applications are becoming more and more useful and valuable in providing academic knowledge.

Keywords: Artificial intelligence (AI); ChatGPT; Large language models; Medical education; Thoracic surgery.

MeSH terms

  • Humans
  • Medicine*
  • Thoracic Surgery*