Scale length does matter: Recommendations for measurement invariance testing with categorical factor analysis and item response theory approaches

E Damiano D'Urso; Kim De Roover; Jeroen K Vermunt; Jesper Tijmstra

doi:10.3758/s13428-021-01690-7

Scale length does matter: Recommendations for measurement invariance testing with categorical factor analysis and item response theory approaches

Behav Res Methods. 2022 Oct;54(5):2114-2145. doi: 10.3758/s13428-021-01690-7. Epub 2021 Dec 15.

Authors

E Damiano D'Urso¹, Kim De Roover², Jeroen K Vermunt², Jesper Tijmstra²

Affiliations

¹ Department of Methodology and Statistics, School of Social and Behavioral Sciences, Tilburg University, PO Box 90153, 5000 LE, Tilburg, The Netherlands. e.d.durso@tilburguniversity.edu.
² Department of Methodology and Statistics, School of Social and Behavioral Sciences, Tilburg University, PO Box 90153, 5000 LE, Tilburg, The Netherlands.

Abstract

In social sciences, the study of group differences concerning latent constructs is ubiquitous. These constructs are generally measured by means of scales composed of ordinal items. In order to compare these constructs across groups, one crucial requirement is that they are measured equivalently or, in technical jargon, that measurement invariance (MI) holds across the groups. This study compared the performance of scale- and item-level approaches based on multiple group categorical confirmatory factor analysis (MG-CCFA) and multiple group item response theory (MG-IRT) in testing MI with ordinal data. In general, the results of the simulation studies showed that MG-CCFA-based approaches outperformed MG-IRT-based approaches when testing MI at the scale level, whereas, at the item level, the best performing approach depends on the tested parameter (i.e., loadings or thresholds). That is, when testing loadings equivalence, the likelihood ratio test provided the best trade-off between true-positive rate and false-positive rate, whereas, when testing thresholds equivalence, the χ² test outperformed the other testing strategies. In addition, the performance of MG-CCFA's fit measures, such as RMSEA and CFI, seemed to depend largely on the length of the scale, especially when MI was tested at the item level. General caution is recommended when using these measures, especially when MI is tested for each item individually.

Keywords: CFA (confirmatory factor analysis); Categorical data; DIF (differential item functioning); IRT (item response theory); Measurement invariance.

MeSH terms

Factor Analysis, Statistical*
Humans
Psychometrics / methods