Inter-Rater Reliability of Grading Undergraduate Portfolios in Veterinary Medical Education

Robert P Favier; Johannes C M Vernooij; F Herman Jonker; Harold G J Bok

doi:10.3138/jvme.0917-128r1

Inter-Rater Reliability of Grading Undergraduate Portfolios in Veterinary Medical Education

J Vet Med Educ. 2019 Winter;46(4):415-422. doi: 10.3138/jvme.0917-128r1. Epub 2019 Mar 28.

Authors

Robert P Favier¹, Johannes C M Vernooij², F Herman Jonker³, Harold G J Bok⁴

Affiliations

¹ Department of Clinical Sciences of Companion Animals, Faculty of Veterinary Medicine, Utrecht University.
² Biostatistician, and Teacher in Methodology and Statistics, Department of Farm Animal Health, Faculty of Veterinary Medicine, Utrecht University.
³ Chair of the Portfolio Evaluation Committee, and Teacher in Reproduction, Department of Farm Animal Health, Faculty of Veterinary Medicine.
⁴ Centre for Quality Improvement in Veterinary Education, Faculty of Veterinary Medicine, 3508 TC Utrecht University.

PMID: 30920333
DOI: 10.3138/jvme.0917-128r1

Abstract

The reliability of high-stakes assessment of portfolios containing an aggregation of quantitative and qualitative data based on programmatic assessment is under debate, especially when multiple assessors are involved. In this study carried out at the Faculty of Veterinary Medicine, Utrecht University, the Netherlands, two independent assessors graded the portfolios of students in their second year of the 3-year clinical phase. The similarity of grades (i.e., equal grades) and the level of the grades were studied to estimate inter-rater reliability, taking into account the potential effects of the assessor's background (i.e., originating from a clinical or non-clinical department) and student's cohort group, gender, and chosen master track (Companion Animal Health, Equine Health, or Farm Animal/Public Health). Whereas the similarity between the two grades increased from 58% in the first year the grading system was introduced to around 80% afterwards, the grade level was lower over the next 3 years. The assessor's background had a minor effect on the proportion of similar grades, as well as on grading level. The assessor intraclass correlation was low (i.e., all assessors scored with a similar grading pattern [same range of grades]). The grades awarded to female students were higher but more often dissimilar. We conclude that the grading system was well implemented and has a high inter-rater reliability.

Keywords: high-stakes assessment; inter-rater reliability; portfolio; veterinary.

MeSH terms

Education, Medical, Undergraduate* / standards
Education, Veterinary* / standards
Educational Measurement*
Female
Humans
Male
Netherlands
Reproducibility of Results
Students