Application of the performance of machine learning techniques as support in the prediction of school dropout

Sci Rep. 2024 Feb 17;14(1):3957. doi: 10.1038/s41598-024-53576-1.

Abstract

This article presents a study, intending to design a model with 90% reliability, which helps in the prediction of school dropouts in higher and secondary education institutions, implementing machine learning techniques. The collection of information was carried out with open data from the 2015 Intercensal Survey and the 2010 and 2020 Population and Housing censuses carried out by the National Institute of Statistics and Geography, which contain information about the inhabitants and homes. in the 32 federal entities of Mexico. The data were homologated and twenty variables were selected, based on the correlation. After cleaning the data, there was a sample of 1,080,782 records in total. Supervised learning was used to create the model, automating data processing with training and testing, applying the following techniques, Artificial Neural Networks, Support Vector Machines, Linear Ridge and Lasso Regression, Bayesian Optimization, Random Forest, the first two with a reliability greater than 99% and the last with 91%.

MeSH terms

  • Bayes Theorem
  • Humans
  • Machine Learning*
  • Neural Networks, Computer
  • Reproducibility of Results
  • Student Dropouts*
  • Support Vector Machine