Interrater reliability for multilevel data: A generalizability theory approach

Psychol Methods. 2022 Aug;27(4):650-666. doi: 10.1037/met0000391. Epub 2021 Apr 5.

Abstract

Current interrater reliability (IRR) coefficients ignore the nested structure of multilevel observational data, resulting in biased estimates of both subject- and cluster-level IRR. We used generalizability theory to provide a conceptualization and estimation method for IRR of continuous multilevel observational data. We explain how generalizability theory decomposes the variance of multilevel observational data into subject-, cluster-, and rater-related components, which can be estimated using Markov chain Monte Carlo (MCMC) estimation. We explain how IRR coefficients for each level can be derived from these variance components, and how they can be estimated as intraclass correlation coefficients (ICC). We assessed the quality of MCMC point and interval estimates with a simulation study, and showed that small numbers of raters were the main source of bias and inefficiency of the ICCs. In a follow-up simulation, we showed that a planned missing data design can diminish most estimation difficulties in these conditions, yielding a useful approach to estimating multilevel interrater reliability for most social and behavioral research. We illustrated the method using data on student-teacher relationships. All software code and data used for this article is available on the Open Science Framework: https://osf.io/bwk5t/. (PsycInfo Database Record (c) 2022 APA, all rights reserved).

MeSH terms

  • Behavioral Research*
  • Bias
  • Humans
  • Monte Carlo Method
  • Reproducibility of Results
  • Research Design*