GradeAid: a framework for automatic short answers grading in educational contexts-design, implementation and evaluation

Emiliano Del Gobbo; Alfonso Guarino; Barbara Cafarelli; Luca Grilli

doi:10.1007/s10115-023-01892-9

GradeAid: a framework for automatic short answers grading in educational contexts-design, implementation and evaluation

Knowl Inf Syst. 2023 May 19:1-40. doi: 10.1007/s10115-023-01892-9. Online ahead of print.

Authors

Emiliano Del Gobbo¹, Alfonso Guarino², Barbara Cafarelli¹, Luca Grilli¹

Affiliations

¹ Department of Economics, Management and Territory, University of Foggia, Via da Zara, 11, 71121 Foggia, FG Italy.
² Department of Humanities, University of Foggia, Via Arpi, 176, 71121 Foggia, FG Italy.

Abstract

Automatic short answer grading (ASAG), a hot field of natural language understanding, is a research area within learning analytics. ASAG solutions are conceived to offload teachers and instructors, especially those in higher education, where classes with hundreds of students are the norm and the task of grading (short)answers to open-ended questionnaires becomes tougher. Their outcomes are precious both for the very grading and for providing students with "ad hoc" feedback. ASAG proposals have also enabled different intelligent tutoring systems. Over the years, a variety of ASAG solutions have been proposed, still there are a series of gaps in the literature that we fill in this paper. The present work proposes GradeAid, a framework for ASAG. It is based on the joint analysis of lexical and semantic features of the students' answers through state-of-the-art regressors; differently from any other previous work, (i) it copes with non-English datasets, (ii) it has undergone a robust validation and benchmarking phase, and (iii) it has been tested on every dataset publicly available and on a new dataset (now available for researchers). GradeAid obtains performance comparable to the systems presented in the literature (root-mean-squared errors down to 0.25 based on the specific tuple $⟨$ dataset-question $⟩$ ). We argue it represents a strong baseline for further developments in the field.

Keywords: Automatic short answer grading; Learning analytics; Natural language processing.