Methods for Estimating Item-Score Reliability

Eva A O Zijlmans; L Andries van der Ark; Jesper Tijmstra; Klaas Sijtsma

doi:10.1177/0146621618758290

Methods for Estimating Item-Score Reliability

Appl Psychol Meas. 2018 Oct;42(7):553-570. doi: 10.1177/0146621618758290. Epub 2018 Apr 9.

Authors

Eva A O Zijlmans¹, L Andries van der Ark², Jesper Tijmstra¹, Klaas Sijtsma¹

Affiliations

¹ Tilburg University, Tilburg, Netherlands.
² University of Amsterdam, Amsterdam, Netherlands.

Abstract

Reliability is usually estimated for a test score, but it can also be estimated for item scores. Item-score reliability can be useful to assess the item's contribution to the test score's reliability, for identifying unreliable scores in aberrant item-score patterns in person-fit analysis, and for selecting the most reliable item from a test to use as a single-item measure. Four methods were discussed for estimating item-score reliability: the Molenaar-Sijtsma method (method MS), Guttman's method $λ_{6}$ , the latent class reliability coefficient (method LCRC), and the correction for attenuation (method CA). A simulation study was used to compare the methods with respect to median bias, variability (interquartile range [IQR]), and percentage of outliers. The simulation study consisted of six conditions: standard, polytomous items, unequal $α$ parameters, two-dimensional data, long test, and small sample size. Methods MS and CA were the most accurate. Method LCRC showed almost unbiased results, but large variability. Method $λ_{6}$ consistently underestimated item-score reliabilty, but showed a smaller IQR than the other methods.

Keywords: Guttman’s method λ6; correction for attenuation; item-score reliability; latent class reliability coefficient; method MS.