Factor Retention Using Machine Learning With Ordinal Data

David Goretzko; Markus Bühner

doi:10.1177/01466216221089345

Factor Retention Using Machine Learning With Ordinal Data

Appl Psychol Meas. 2022 Jul;46(5):406-421. doi: 10.1177/01466216221089345. Epub 2022 May 4.

Authors

David Goretzko¹, Markus Bühner¹

Affiliation

¹ LMU Munich, Munchen, Germany.

Abstract

Determining the number of factors in exploratory factor analysis is probably the most crucial decision when conducting the analysis as it clearly influences the meaningfulness of the results (i.e., factorial validity). A new method called the Factor Forest that combines data simulation and machine learning has been developed recently. This method based on simulated data reached very high accuracy for multivariate normal data, but it has not yet been tested with ordinal data. Hence, in this simulation study, we evaluated the Factor Forest with ordinal data based on different numbers of categories (2-6 categories) and compared it to common factor retention criteria. It showed higher overall accuracy for all types of ordinal data than all common factor retention criteria that were used for comparison (Parallel Analysis, Comparison Data, the Empirical Kaiser Criterion and the Kaiser Guttman Rule). The results indicate that the Factor Forest is applicable to ordinal data with at least five categories (typical scale in questionnaire research) in the majority of conditions and to binary or ordinal data based on items with less categories when the sample size is large.

Keywords: exploratory factor analysis; factor retention; factorial validity; machine learning; number of factors; ordinal data.