Early Prediction of Student Learning Performance Through Data Mining: A Systematic Review

Psicothema. 2021 Aug;33(3):456-465. doi: 10.7334/psicothema2021.62.

Abstract

Background: Early prediction of students’ learning performance using data mining techniques is an important topic these days. The purpose of this literature review is to provide an overview of the current state of research in that area.

Method: We conducted a literature review following a two-step procedure, looking for papers using the major search engines and selection based on certain criteria.

Results: The document search process yielded 133 results, 82 of which were selected in order to answer some essential research questions in the area. The selected papers were grouped and described by the type of educational systems, the data mining techniques applied, the variables or features used, and how early accurate prediction was possible.

Conclusions: Most of the papers analyzed were about online learning systems and traditional face-to-face learning in secondary and tertiary education; the most commonly-used predictive algorithms were J48, Random Forest, SVM, and Naive Bayes (classification), and logistic and linear regression (regression). The most important factors in early prediction were related to student assessment and data obtained from student interaction with Learning Management Systems. Finally, how early it was possible to make predictions depended on the type of educational system.

Publication types

  • Systematic Review

MeSH terms

  • Algorithms
  • Bayes Theorem
  • Data Mining*
  • Humans
  • Learning
  • Students*