Unveiling educational patterns at a regional level in Colombia: data from elementary and public high school institutions

Heliyon. 2021 Sep 17;7(9):e08017. doi: 10.1016/j.heliyon.2021.e08017. eCollection 2021 Sep.

Abstract

Even though the field of Learning Analytics (LA) has experienced an expressive growth in the last few years. The vast majority of the works found in literature are usually focusing on experimentation of techniques and methods over datasets restricted to a given discipline, course, or institution and are still few works manipulating region and countrywide datasets. This may be since the implementation of LA in national or regional scope and using data from governments and institutions poses many challenges that may threaten the success of such initiatives, including the same availability of data. The present article describes the experience of LA in Latin America using governmental data from Elementary and Middle Schools of the State of Norte de Santander - Colombia. This study is focusing on students' performance. Data from 2013 to 2018 was collected, containing information related to 1) students' enrollment in school disciplines provided by Regional Education Secretary, 2) students qualifications provided by educational institutions, and 3) students qualifications provided by the national agency for education evaluation. The methodology followed includes a process of cleaning and integration of the data, subsequently a descriptive and visualization analysis is made and some educational data mining techniques are used (decision trees and clustering) for the modeling and extraction of some educational patterns. A total of eight patterns of interest are extracted. In addition to the decision trees, a feature ranking analysis was performed using xgboost and to facilitate the visual representation of the clusters, t-SNE and self-organized maps (SOM) were applied as result projection techniques. Finally, this paper compares the main challenges mentioned by the literature according to the Colombian experience and proposes an up-to-date list of challenges and solutions that can be used as a baseline for future works in this area and aligned with the Latin American context and reality.

Keywords: Educational data; Educational data mining; Learning Analytics; Primary education; Secondary education.