Assessing the inter-rater agreement for ordinal data through weighted indexes

Stat Methods Med Res. 2016 Dec;25(6):2611-2633. doi: 10.1177/0962280214529560. Epub 2014 Apr 16.

Abstract

Assessing the inter-rater agreement between observers, in the case of ordinal variables, is an important issue in both the statistical theory and biomedical applications. Typically, this problem has been dealt with the use of Cohen's weighted kappa, which is a modification of the original kappa statistic, proposed for nominal variables in the case of two observers. Fleiss (1971) put forth a generalization of kappa in the case of multiple observers, but both Cohen's and Fleiss' kappa could have a paradoxical behavior, which may lead to a difficult interpretation of their magnitude. In this paper, a modification of Fleiss' kappa, not affected by paradoxes, is proposed, and subsequently generalized to the case of ordinal variables. Monte Carlo simulations are used both to testing statistical hypotheses and to calculating percentile and bootstrap-t confidence intervals based on this statistic. The normal asymptotic distribution of the proposed statistic is demonstrated. Our results are applied to the classical Holmquist et al.'s (1967) dataset on the classification, by multiple observers, of carcinoma in situ of the uterine cervix. Finally, we generalize the use of s* to a bivariate case.

Keywords: Fleiss’ kappa; inter-rater agreement; multiple observers; ordinal variables; weighted indexes.

MeSH terms

  • Carcinoma in Situ / classification
  • Carcinoma in Situ / diagnosis
  • Carcinoma in Situ / pathology
  • Female
  • Humans
  • Monte Carlo Method
  • Observer Variation*
  • Reproducibility of Results
  • Uterine Cervical Neoplasms / classification
  • Uterine Cervical Neoplasms / diagnosis
  • Uterine Cervical Neoplasms / pathology