Assessing the reliability of ordered categorical scales using kappa-type statistics

Chris Roberts; Roseanne McNamee

doi:10.1191/0962280205sm413oa

Assessing the reliability of ordered categorical scales using kappa-type statistics

Stat Methods Med Res. 2005 Oct;14(5):493-514. doi: 10.1191/0962280205sm413oa.

Authors

Chris Roberts¹, Roseanne McNamee

Affiliation

¹ Division of Epidemiology and Health Sciences, Stopford Building, The University of Manchester, Manchester, UK. chris.roberts@manchester.ac.uk

PMID: 16248350
DOI: 10.1191/0962280205sm413oa

Abstract

Methods for the analysis of reliability of ordered categorical scales are discussed, focussing on the limitation of the single summary-weighted kappa coefficients. A symmetric matrix of kappa-type coefficients is suggested as an alternative. The method is proposed as being suitable for ordinal scale where there is no underlying continuum. Their application is illustrated using two data sets from reliability studies. If, instead, distances between categories can be specified, a weighted mean of the matrix terms can be used as a summary measure. This is equal to a weighted kappa coefficient with squared weights, provided distances between adjacent categories are equal. When a study design corresponds to a one-way random effects model, estimates of precision of kappa-type coefficient, including the coefficients described here, can be obtained using the delta-method, bootstrap resampling by subjects or jack-knifing by subjects. In the case of interobserver reliability studies, where there may be systematic differences between observers, the investigator may wish to generalise to a population of observers and subjects. In this case, jack-knifing by observer and subject is suggested. Empirical comparisons are made between standard error estimates based on the delta-method, on jack-knifing by subjects and a two-way jack-knife by subjects and observers. The results suggest that standard errors based on the delta-method or jack-knifing by subject alone may be overly precise.

MeSH terms

Biomedical Research / statistics & numerical data
Models, Statistical*
Reproducibility of Results*
Research Design / statistics & numerical data*
United Kingdom