How sure can we be that a student really failed? On the measurement precision of individual pass-fail decisions from the perspective of Item Response Theory

Med Teach. 2020 Dec;42(12):1374-1384. doi: 10.1080/0142159X.2020.1811844. Epub 2020 Aug 28.

Abstract

Background: In high-stakes assessments in medical education, the decision to let a particular participant pass or fail has far-reaching consequences. Reliability coefficients are usually used to support the trustworthiness of assessments and their accompanying decisions. However, coefficients such as Cronbach's Alpha do not indicate the precision with which an individual's performance was measured.

Objective: Since estimates of precision need to be aligned with the level on which inferences are made, we illustrate how to adequately report the precision of pass-fail decisions for single individuals.

Method: We show how to calculate the precision of individual pass-fail decisions using Item Response Theory and illustrate that approach using a real exam. In total, 70 students sat this exam (110 items). Reliability coefficients were above recommendations for high stakes test (> 0.80). At the same time, pass-fail decisions around the cut score were expected to show low accuracy.

Conclusions: Our results illustrate that the most important decisions-i.e. those based on scores near the pass-fail cut-score-are often ambiguous, and that reporting a traditional reliability coefficient is not an adequate description of the uncertainty encountered on an individual level.

Keywords: Item Response Theory; Psychometrics; general; measurement precision; pass-fail decisions; reliability.

MeSH terms

  • Clinical Competence
  • Education, Medical*
  • Educational Measurement*
  • Humans
  • Reproducibility of Results
  • Students