Performance of ChatGPT in French language Parcours d'Accès Spécifique Santé test and in OBGYN

Paul-Adrien Guigue; Raanan Meyer; Gaetan Thivolle-Lioux; Yoav Brezinov; Gabriel Levin

doi:10.1002/ijgo.15083

Performance of ChatGPT in French language Parcours d'Accès Spécifique Santé test and in OBGYN

Int J Gynaecol Obstet. 2024 Mar;164(3):959-963. doi: 10.1002/ijgo.15083. Epub 2023 Sep 1.

Authors

Paul-Adrien Guigue^{1

2}, Raanan Meyer^{3

4

5}, Gaetan Thivolle-Lioux^{1

6}, Yoav Brezinov², Gabriel Levin^{2

7}

Affiliations

¹ University Claude Bernard Lyon I, Lyon, France.
² Lady Davis Institute for Cancer Research, Jewish General Hospital, McGill University, Montreal, Quebec, Canada.
³ Department of Obstetrics and Gynecology, Chaim Sheba Medical Center, Ramat-Gan, Israel.
⁴ Faculty of Medicine, Tel-Aviv University, Tel-Aviv, Israel.
⁵ Cedar-Sinai Medical Center, Los Angeles, California, USA.
⁶ Centre de Recherche en Cancérologie de Lyon (CRCL), Lyon, France.
⁷ The Department of Gynecologic Oncology, Hadassah Medical Center, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel.

PMID: 37655838
DOI: 10.1002/ijgo.15083

Abstract

Objectives: To evaluate the performance of ChatGPT in a French medical school entrance examination.

Methods: A cross-sectional study using a consecutive sample of text-based multiple-choice practice questions for the Parcours d'Accès Spécifique Santé. ChatGPT answered questions in French. We compared performance of ChatGPT in obstetrics and gynecology (OBGYN) and in the whole test.

Results: Overall, 885 questions were evaluated. The mean test score was 34.0% (306; maximal score of 900). The performance of ChatGPT was 33.0% (292 correct answers, 885 questions). The performance of ChatGPT was lower in biostatistics (13.3% ± 19.7%) than in anatomy (34.2% ± 17.9%; P = 0.037) and also lower than in histology and embryology (40.0% ± 18.5%; P = 0.004). The OBGYN part had 290 questions. There was no difference in the test scores and the performance of ChatGPT in OBGYN versus the whole entrance test (P = 0.76 vs P = 0.10, respectively).

Conclusions: ChatGPT answered one-third of questions correctly in the French test preparation. The performance in OBGYN was similar.

Keywords: ChatGPT; French; OBGYN; large language models; performance; test.

MeSH terms

Biometry
Cross-Sectional Studies
Female
Gynecology*
Humans
Language
Obstetrics*
Pregnancy