An experimental comparison of multiple-choice and short-answer questions on a high-stakes test for medical students

Janet Mee; Ravi Pandian; Justin Wolczynski; Amy Morales; Miguel Paniagua; Polina Harik; Peter Baldwin; Brian E Clauser

doi:10.1007/s10459-023-10266-3

An experimental comparison of multiple-choice and short-answer questions on a high-stakes test for medical students

Adv Health Sci Educ Theory Pract. 2023 Sep 4. doi: 10.1007/s10459-023-10266-3. Online ahead of print.

Authors

Janet Mee¹, Ravi Pandian¹, Justin Wolczynski¹, Amy Morales¹, Miguel Paniagua², Polina Harik¹, Peter Baldwin¹, Brian E Clauser³

Affiliations

¹ NBME, Philadelphia, USA.
² American College of Physicians, Philadelphia, USA.
³ NBME, Philadelphia, USA. bclauser@nbme.org.

PMID: 37665413
DOI: 10.1007/s10459-023-10266-3

Abstract

Recent advances in automated scoring technology have made it practical to replace multiple-choice questions (MCQs) with short-answer questions (SAQs) in large-scale, high-stakes assessments. However, most previous research comparing these formats has used small examinee samples testing under low-stakes conditions. Additionally, previous studies have not reported on the time required to respond to the two item types. This study compares the difficulty, discrimination, and time requirements for the two formats when examinees responded as part of a large-scale, high-stakes assessment. Seventy-one MCQs were converted to SAQs. These matched items were randomly assigned to examinees completing a high-stakes assessment of internal medicine. No examinee saw the same item in both formats. Items administered in the SAQ format were generally more difficult than items in the MCQ format. The discrimination index for SAQs was modestly higher than that for MCQs and response times were substantially higher for SAQs. These results support the interchangeability of MCQs and SAQs. When it is important that the examinee generate the response rather than selecting it, SAQs may be preferred. The results relating to difficulty and discrimination reported in this paper are consistent with those of previous studies. The results on the relative time requirements for the two formats suggest that with a fixed testing time fewer SAQs can be administered, this limitation more than makes up for the higher discrimination that has been reported for SAQs. We additionally examine the extent to which increased difficulty may directly impact the discrimination of SAQs.

Keywords: Constructed response; Item performance; Multiple choice; Short answer.