Exploring the Machine Learning Paradigm in Determining Risk for Reading Disability

Sci Stud Read. 2023;27(1):5-20. doi: 10.1080/10888438.2022.2115914. Epub 2022 Dec 22.

Abstract

Purpose: Researchers have developed a constellation model of decoding-related reading disabilities (RD) to improve the RD risk determination. The model's hallmark is its inclusion of various RD indicators to determine RD risk. Classification methods such as logistic regression (LR) might be one way to determine RD risk within the constellation model framework. However, some issues may arise with applying the logistic regression method (e.g., multicollinearity). Machine learning techniques, such as random forest (RF), might assist in overcoming these limitations. They can better deal with complex data relations than traditional approaches. We examined the prediction performance of RF and compared it against LR to determine RD risk.

Method: The sample comprised 12,171 students from Florida whose third-grade RD risk was operationalized using the constellation model with one, two, three, or four RD indicators in first and second grade.

Results: Results revealed that LR and RF performed on par in accurately predicting RD risk. Regarding predictor importance, reading fluency was consistently the most critical predictor for RD risk.

Conclusion: Findings suggest that RF does not outperform LR in RD prediction accuracy in models with multiple linearly related predictors. Findings also highlight including reading fluency in early identification batteries for later RD determination.

Keywords: logistic regression; prediction performance; random forest; reading disability.