Rasch Versus Classical Equating in the Context of Small Sample Sizes

Educ Psychol Meas. 2020 Jun;80(3):499-521. doi: 10.1177/0013164419878483. Epub 2019 Sep 30.

Abstract

Equating and scaling in the context of small sample exams, such as credentialing exams for highly specialized professions, has received increased attention in recent research. Investigators have proposed a variety of both classical and Rasch-based approaches to the problem. This study attempts to extend past research by (1) directly comparing classical and Rasch techniques of equating exam scores when sample sizes are small (N≤ 100 per exam form) and (2) attempting to pool multiple forms' worth of data to improve estimation in the Rasch framework. We simulated multiple years of a small-sample exam program by resampling from a larger certification exam program's real data. Results showed that combining multiple administrations' worth of data via the Rasch model can lead to more accurate equating compared to classical methods designed to work well in small samples. WINSTEPS-based Rasch methods that used multiple exam forms' data worked better than Bayesian Markov Chain Monte Carlo methods, as the prior distribution used to estimate the item difficulty parameters biased predicted scores when there were difficulty differences between exam forms.

Keywords: Markov chain Monte Carlo (MCMC); Rasch model; circle-arc equating; equating; linking; nominal weights mean equating; small samples.