Bard Versus the 2022 American Society of Plastic Surgeons In-Service Examination: Performance on the Examination in Its Intern Year

Daniel Najafali; Erik Reiche; Sthefano Araya; Justin M Camacho; Farrah C Liu; Thomas Johnstone; Sameer A Patel; Shane D Morrison; Amir H Dorafshar; Paige M Fox

doi:10.1093/asjof/ojad066

Bard Versus the 2022 American Society of Plastic Surgeons In-Service Examination: Performance on the Examination in Its Intern Year

Aesthet Surg J Open Forum. 2023 Jul 19:6:ojad066. doi: 10.1093/asjof/ojad066. eCollection 2024.

Authors

Daniel Najafali, Erik Reiche, Sthefano Araya, Justin M Camacho, Farrah C Liu, Thomas Johnstone, Sameer A Patel, Shane D Morrison, Amir H Dorafshar, Paige M Fox

Abstract

Background: Bard is a conversational generative artificial intelligence (AI) platform released by Google (Mountain View, CA) to the public in May 2023.

Objectives: This study investigates the performance of Bard on the American Society of Plastic Surgeons (ASPS) In-Service Examination to compare it to residents' performance nationally. We hypothesized that Bard would perform best on the comprehensive and core surgical principles portions of the examination.

Methods: Google's 2023 Bard was used to answer questions from the 2022 ASPS In-Service Examination. Each question was asked as written with the stem and multiple-choice options. The 2022 ASPS Norm Table was utilized to compare Bard's performance to that of subgroups of plastic surgery residents.

Results: A total of 231 questions were included. Bard answered 143 questions correctly corresponding to an accuracy of 62%. The highest-performing section was the comprehensive portion (73%). When compared with integrated residents nationally, Bard scored in the 74th percentile for post-graduate year (PGY)-1, 34th percentile for PGY-2, 20th percentile for PGY-3, 8th percentile for PGY-4, 1st percentile for PGY-5, and 2nd percentile for PGY-6.

Conclusions: Bard outperformed more than half of the first-year integrated residents (74th percentile). Its best sections were the comprehensive and core surgical principle portions of the examination. Further analysis of the chatbot's incorrect questions might help improve the overall quality of the examination's questions.