Inter-Rater Reliability of Grading Undergraduate Portfolios in Veterinary Medical Education

J Vet Med Educ. 2019 Winter;46(4):415-422. doi: 10.3138/jvme.0917-128r1. Epub 2019 Mar 28.

Abstract

The reliability of high-stakes assessment of portfolios containing an aggregation of quantitative and qualitative data based on programmatic assessment is under debate, especially when multiple assessors are involved. In this study carried out at the Faculty of Veterinary Medicine, Utrecht University, the Netherlands, two independent assessors graded the portfolios of students in their second year of the 3-year clinical phase. The similarity of grades (i.e., equal grades) and the level of the grades were studied to estimate inter-rater reliability, taking into account the potential effects of the assessor's background (i.e., originating from a clinical or non-clinical department) and student's cohort group, gender, and chosen master track (Companion Animal Health, Equine Health, or Farm Animal/Public Health). Whereas the similarity between the two grades increased from 58% in the first year the grading system was introduced to around 80% afterwards, the grade level was lower over the next 3 years. The assessor's background had a minor effect on the proportion of similar grades, as well as on grading level. The assessor intraclass correlation was low (i.e., all assessors scored with a similar grading pattern [same range of grades]). The grades awarded to female students were higher but more often dissimilar. We conclude that the grading system was well implemented and has a high inter-rater reliability.

Keywords: high-stakes assessment; inter-rater reliability; portfolio; veterinary.

MeSH terms

  • Education, Medical, Undergraduate* / standards
  • Education, Veterinary* / standards
  • Educational Measurement*
  • Female
  • Humans
  • Male
  • Netherlands
  • Reproducibility of Results
  • Students