Meta-Analysis of Interrater Reliability of Supervisory Performance Ratings: Effects of Appraisal Purpose, Scale Type, and Range Restriction

Front Psychol. 2019 Oct 18:10:2281. doi: 10.3389/fpsyg.2019.02281. eCollection 2019.

Abstract

Objectives: This reliability generalization study aimed to estimate the mean and variance of the interrater reliability coefficients (r yy ) of supervisory ratings of overall, task, contextual, and positive job performance. The moderating effect of the appraisal purpose and the scale type was examined. It was hypothesized that the ratings collected for research purposes and multi-item scales have higher r yy . It was also examined whether r yy was similar for the four performance dimensions. Method: A database consisting of 224 independent samples was created and hierarchical sub-grouping meta-analyses were conducted. Results: The appraisal purpose was a moderator of r yy for the four performance dimensions. Scale type was a moderator of r yy for overall and task performance collected for research purposes. The findings also suggest that supervisors seem to have less difficulty evaluating overall job performance than task, contextual, and positive performance. The best estimates of the observed r yy for overall job performance are 0.61 for research-collected ratings and 0.45 for administrative-collected ratings. Conclusions: (1) Appraisal purpose moderates r yy and researchers and practitioners should be aware of its effects before collecting ratings or using empirically-derived interrater reliability distributions, (2) Scale type seems to moderate r yy in the case of the ratings collected for research purposes, only, (3) overall job performance is more reliably rated than task, contextual, and positive performance. Implications for research and practice are discussed.

Keywords: appraisal purpose; interrater reliability; meta-analysis; range restriction; scale type; supervisory performance ratings.

Publication types

  • Systematic Review