Automatic grading for Arabic short answer questions using optimized deep learning model

Mustafa Abdul Salam; Mohamed Abd El-Fatah; Naglaa Fathy Hassan

doi:10.1371/journal.pone.0272269

Automatic grading for Arabic short answer questions using optimized deep learning model

PLoS One. 2022 Aug 2;17(8):e0272269. doi: 10.1371/journal.pone.0272269. eCollection 2022.

Authors

Mustafa Abdul Salam^{1

2}, Mohamed Abd El-Fatah³, Naglaa Fathy Hassan⁴

Affiliations

¹ Artificial intelligence Dept., Faculty of Computers and Artificial Intelligence, Benha University, Banha, Egypt.
² Faculty of Computer Studies, Arab open University, Nasr City, Egypt.
³ Information System Dept., Computers and Information Faculty - Benha University, Banha, Egypt.
⁴ Information and Operations Dept., National Center for Examinations and Educational Evaluation (NCEEE), Cairo, Egypt.

Abstract

Auto-grading of short answer questions is considered a challenging problem in the processing of natural language. It requires a system to comprehend the free text answers to automatically assign a grade for a student answer compared to one or more model answers. This paper suggests an optimized deep learning model for grading short-answer questions automatically by using various sizes of datasets collected in the Science subject for students in seventh grade in Egypt. The proposed system is a hybrid approach that optimizes a deep learning technique called LSTM (Long Short Term Memory) with a recent optimization algorithm called a Grey Wolf Optimizer (GWO). The GWO is employed to optimize the LSTM by selecting the best dropout and recurrent dropout rates of LSTM hyperparameters rather than manual choice. Using GWO makes the LSTM model more generalized and can also avoid the problem of overfitting in forecasting the students' scores to improve the learning process and save instructors' time and effort. The model's performance is measured in terms of the Root Mean Squared Error (RMSE), the Pearson correlation coefficient, and R-Square. According to the simulation results, the hybrid GWO with the LSTM model ensured the best performance and outperformed the classical LSTM model and other compared models such that it had the highest Pearson correlation coefficient value, the lowest RMSE value, and the best R square value in all experiments, but higher training time than the traditional deep learning model.

MeSH terms

Algorithms
Deep Learning*
Forecasting
Humans
Language
Neural Networks, Computer*

Grants and funding

The authors received no specific funding for this work.