Exploring the Machine Learning Paradigm in Determining Risk for Reading Disability

Florina Erbeli; Kai He; Connor Cheek; Marianne Rice; Xiaoning Qian

doi:10.1080/10888438.2022.2115914

Exploring the Machine Learning Paradigm in Determining Risk for Reading Disability

Sci Stud Read. 2023;27(1):5-20. doi: 10.1080/10888438.2022.2115914. Epub 2022 Dec 22.

Authors

Florina Erbeli¹, Kai He², Connor Cheek³, Marianne Rice¹, Xiaoning Qian²

Affiliations

¹ Department of Educational Psychology, Texas A&M University.
² Department of Electrical and Computer Engineering, Texas A&M University.
³ Department of Physics, University of Houston.

Abstract

Purpose: Researchers have developed a constellation model of decoding-related reading disabilities (RD) to improve the RD risk determination. The model's hallmark is its inclusion of various RD indicators to determine RD risk. Classification methods such as logistic regression (LR) might be one way to determine RD risk within the constellation model framework. However, some issues may arise with applying the logistic regression method (e.g., multicollinearity). Machine learning techniques, such as random forest (RF), might assist in overcoming these limitations. They can better deal with complex data relations than traditional approaches. We examined the prediction performance of RF and compared it against LR to determine RD risk.

Method: The sample comprised 12,171 students from Florida whose third-grade RD risk was operationalized using the constellation model with one, two, three, or four RD indicators in first and second grade.

Results: Results revealed that LR and RF performed on par in accurately predicting RD risk. Regarding predictor importance, reading fluency was consistently the most critical predictor for RD risk.

Conclusion: Findings suggest that RF does not outperform LR in RD prediction accuracy in models with multiple linearly related predictors. Findings also highlight including reading fluency in early identification batteries for later RD determination.

Keywords: logistic regression; prediction performance; random forest; reading disability.

Grants and funding

P50 HD052120/HD/NICHD NIH HHS/United States