Modeling Booklet Effects for Nonequivalent Group Designs in Large-Scale Assessment

Educ Psychol Meas. 2015 Aug;75(4):568-584. doi: 10.1177/0013164414554219. Epub 2014 Nov 3.

Abstract

Multiple matrix designs are commonly used in large-scale assessments to distribute test items to students. These designs comprise several booklets, each containing a subset of the complete item pool. Besides reducing the test burden of individual students, using various booklets allows aligning the difficulty of the presented items to the assumed performance level of examined subgroups. While this may improve measurement precision and students' test-taking motivation, using several booklets might influence response behavior and thus constitute a potential source of unwanted variation. To provide guidance to identify and model booklet effects, this study presents statistical models accounting for booklet effects and applies these models in a large-scale assessment setting. Three models are derived from the Rasch model employing the generalized linear mixed models framework. The models were applied to data from a national educational standards assessment study for scientific competence. A total of 1,021 items were compiled to 74 booklets distributed to a sample of 9,044 students of Grades 9 and 10. The results revealed a small but nonnegligible booklet effect. For further large-scale assessment studies, it is recommended to examine whether booklet effects occur and to adequately account for them in the subsequent analyses where necessary.

Keywords: context effects; generalized linear mixed models (GLMM); large-scale assessment; multiple matrix sampling; nonequivalent groups; testing.