Multilevel Reliability Measures of Latent Scores Within an Item Response Theory Framework

Multivariate Behav Res. 2019 Nov-Dec;54(6):856-881. doi: 10.1080/00273171.2019.1596780. Epub 2019 Jun 19.

Abstract

This paper evaluated multilevel reliability measures in two-level nested designs (e.g., students nested within teachers) within an item response theory framework. A simulation study was implemented to investigate the behavior of the multilevel reliability measures and the uncertainty associated with the measures in various multilevel designs regarding the number of clusters, cluster sizes, and intraclass correlations (ICCs), and in different test lengths, for two parameterizations of multilevel item response models with separate item discriminations or the same item discrimination over levels. Marginal maximum likelihood estimation (MMLE)-multiple imputation and Bayesian analysis were employed to evaluate the accuracy of the multilevel reliability measures and the empirical coverage rates of Monte Carlo (MC) confidence or credible intervals. Considering the accuracy of the multilevel reliability measures and the empirical coverage rate of the intervals, the results lead us to generally recommend MMLE-multiple imputation. In the model with separate item discriminations over levels, marginally acceptable accuracy of the multilevel reliability measures and empirical coverage rate of the MC confidence intervals were found in a limited condition, 200 clusters, 30 cluster size, .2 ICC, and 40 items, in MMLE-multiple imputation. In the model with the same item discrimination over levels, the accuracy of the multilevel reliability measures and the empirical coverage rate of the MC confidence intervals were acceptable in all multilevel designs we considered with 40 items under MMLE-multiple imputation. We discuss these findings and provide guidelines for reporting multilevel reliability measures.

Keywords: Bayesian analysis; item response theory; marginal maximum likelihood estimation; multilevel model; multiple imputation; reliability coefficient.

MeSH terms

  • Bayes Theorem
  • Humans
  • Likelihood Functions*
  • Monte Carlo Method
  • Multilevel Analysis*
  • Psychological Theory
  • Reproducibility of Results*
  • Surveys and Questionnaires