Weighted kappa for multiple raters

Kenneth J Berry; Janis E Johnston; Paul W Mielke Jr

doi:10.2466/pms.107.3.837-848

Weighted kappa for multiple raters

Percept Mot Skills. 2008 Dec;107(3):837-48. doi: 10.2466/pms.107.3.837-848.

Authors

Kenneth J Berry¹, Janis E Johnston, Paul W Mielke Jr

Affiliation

¹ Department of Sociology, Colorado State University, Fort Collins, CO 80523-1784, USA. berry@lamar.colostate.edu

PMID: 19235413
DOI: 10.2466/pms.107.3.837-848

Abstract

Five procedures to calculate the probability of weighted kappa with multiple raters under the null hypothesis of independence are described and compared in terms of accuracy, ease of use, generality, and limitations. The five procedures are (1) exact variance, (2) resampling contingency, (3) intraclass correlation, (4) randomized block, and (5) resampling block. While each procedure possesses strengths and limitations, the resampling contingency procedure is shown to be the most versatile and accurate of the five procedures, provided the number of raters is not too large. The resampling contingency procedure permits any weighting scheme, accommodates both symmetrical and asymmetrical weights, is suitable for both weighted and unweighted kappa, and makes no assumptions about either the data distribution or the probability distribution.

MeSH terms

Humans
Models, Statistical*
Observer Variation
Psychology / statistics & numerical data