Polytomous Testlet Response Models for Technology-Enhanced Innovative Items: Implications on Model Fit and Trait Inference

Hyeon-Ah Kang; Suhwa Han; Doyoung Kim; Shu-Chuan Kao

doi:10.1177/00131644211032261

Polytomous Testlet Response Models for Technology-Enhanced Innovative Items: Implications on Model Fit and Trait Inference

Educ Psychol Meas. 2022 Aug;82(4):811-838. doi: 10.1177/00131644211032261. Epub 2021 Aug 2.

Authors

Hyeon-Ah Kang¹, Suhwa Han¹, Doyoung Kim², Shu-Chuan Kao²

Affiliations

¹ University of Texas at Austin, Austin, TX, USA.
² National Council of State Boards of Nursing, Chicago, IL, USA.

Abstract

The development of technology-enhanced innovative items calls for practical models that can describe polytomous testlet items. In this study, we evaluate four measurement models that can characterize polytomous items administered in testlets: (a) generalized partial credit model (GPCM), (b) testlet-as-a-polytomous-item model (TPIM), (c) random-effect testlet model (RTM), and (d) fixed-effect testlet model (FTM). Using data from GPCM, FTM, and RTM, we examine performance of the scoring models in multiple aspects: relative model fit, absolute item fit, significance of testlet effects, parameter recovery, and classification accuracy. The empirical analysis suggests that relative performance of the models varies substantially depending on the testlet-effect type, effect size, and trait estimator. When testlets had no or fixed effects, GPCM and FTM led to most desirable measurement outcomes. When testlets had random interaction effects, RTM demonstrated best model fit and yet showed substantially different performance in the trait recovery depending on the estimator. In particular, the advantage of RTM as a scoring model was discernable only when there existed strong random effects and the trait levels were estimated with Bayes priors. In other settings, the simpler models (i.e., GPCM, FTM) performed better or comparably. The study also revealed that polytomous scoring of testlet items has limited prospect as a functional scoring method. Based on the outcomes of the empirical evaluation, we provide practical guidelines for choosing a measurement model for polytomous innovative items that are administered in testlets.

Keywords: innovative items; item response theory; polytomous items; technology-enhanced assessment; testlet.