Scale length does matter: Recommendations for measurement invariance testing with categorical factor analysis and item response theory approaches

Behav Res Methods. 2022 Oct;54(5):2114-2145. doi: 10.3758/s13428-021-01690-7. Epub 2021 Dec 15.

Abstract

In social sciences, the study of group differences concerning latent constructs is ubiquitous. These constructs are generally measured by means of scales composed of ordinal items. In order to compare these constructs across groups, one crucial requirement is that they are measured equivalently or, in technical jargon, that measurement invariance (MI) holds across the groups. This study compared the performance of scale- and item-level approaches based on multiple group categorical confirmatory factor analysis (MG-CCFA) and multiple group item response theory (MG-IRT) in testing MI with ordinal data. In general, the results of the simulation studies showed that MG-CCFA-based approaches outperformed MG-IRT-based approaches when testing MI at the scale level, whereas, at the item level, the best performing approach depends on the tested parameter (i.e., loadings or thresholds). That is, when testing loadings equivalence, the likelihood ratio test provided the best trade-off between true-positive rate and false-positive rate, whereas, when testing thresholds equivalence, the χ2 test outperformed the other testing strategies. In addition, the performance of MG-CCFA's fit measures, such as RMSEA and CFI, seemed to depend largely on the length of the scale, especially when MI was tested at the item level. General caution is recommended when using these measures, especially when MI is tested for each item individually.

Keywords: CFA (confirmatory factor analysis); Categorical data; DIF (differential item functioning); IRT (item response theory); Measurement invariance.

MeSH terms

  • Factor Analysis, Statistical*
  • Humans
  • Psychometrics / methods