A Bayesian hierarchical latent trait model for estimating rater bias and reliability in large-scale performance assessment

PLoS One. 2018 Apr 3;13(4):e0195297. doi: 10.1371/journal.pone.0195297. eCollection 2018.

Abstract

We propose a novel approach to modelling rater effects in scoring-based assessment. The approach is based on a Bayesian hierarchical model and simulations from the posterior distribution. We apply it to large-scale essay assessment data over a period of 5 years. Empirical results suggest that the model provides a good fit for both the total scores and when applied to individual rubrics. We estimate the median impact of rater effects on the final grade to be ± 2 points on a 50 point scale, while 10% of essays would receive a score at least ± 5 different from their actual quality. Most of the impact is due to rater unreliability, not rater bias.

Publication types

  • Validation Study

MeSH terms

  • Academic Performance
  • Bayes Theorem
  • Computer Simulation
  • Humans
  • Judgment
  • Models, Statistical*
  • Observer Variation*
  • Reproducibility of Results*
  • Writing

Grants and funding

The authors received no specific funding for this work.