A Bayesian hierarchical latent trait model for estimating rater bias and reliability in large-scale performance assessment

Kaja Zupanc; Erik Štrumbelj

doi:10.1371/journal.pone.0195297

A Bayesian hierarchical latent trait model for estimating rater bias and reliability in large-scale performance assessment

PLoS One. 2018 Apr 3;13(4):e0195297. doi: 10.1371/journal.pone.0195297. eCollection 2018.

Authors

Kaja Zupanc¹, Erik Štrumbelj¹

Affiliation

¹ Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia.

Abstract

We propose a novel approach to modelling rater effects in scoring-based assessment. The approach is based on a Bayesian hierarchical model and simulations from the posterior distribution. We apply it to large-scale essay assessment data over a period of 5 years. Empirical results suggest that the model provides a good fit for both the total scores and when applied to individual rubrics. We estimate the median impact of rater effects on the final grade to be ± 2 points on a 50 point scale, while 10% of essays would receive a score at least ± 5 different from their actual quality. Most of the impact is due to rater unreliability, not rater bias.

Publication types

Validation Study

MeSH terms

Academic Performance
Bayes Theorem
Computer Simulation
Humans
Judgment
Models, Statistical*
Observer Variation*
Reproducibility of Results*
Writing

Grants and funding

The authors received no specific funding for this work.