A comparative analysis on question classification task based on deep learning approaches

Muhammad Zulqarnain; Ahmed Khalaf Zager Alsaedi; Rozaida Ghazali; Muhammad Ghulam Ghouse; Wareesa Sharif; Noor Aida Husaini

doi:10.7717/peerj-cs.570

A comparative analysis on question classification task based on deep learning approaches

PeerJ Comput Sci. 2021 Aug 3:7:e570. doi: 10.7717/peerj-cs.570. eCollection 2021.

Authors

Muhammad Zulqarnain¹, Ahmed Khalaf Zager Alsaedi², Rozaida Ghazali¹, Muhammad Ghulam Ghouse¹, Wareesa Sharif³, Noor Aida Husaini¹

Affiliations

¹ Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia (UTHM), Batu Pahat, Johor, Malaysia.
² Physic Department, College of Science, University of Misan, Iraq.
³ Faculty of Computing, The Islamia University Bahawalpur, Bahawalpur, Punjab, Pakistan.

Abstract

Question classification is one of the essential tasks for automatic question answering implementation in natural language processing (NLP). Recently, there have been several text-mining issues such as text classification, document categorization, web mining, sentiment analysis, and spam filtering that have been successfully achieved by deep learning approaches. In this study, we illustrated and investigated our work on certain deep learning approaches for question classification tasks in an extremely inflected Turkish language. In this study, we trained and tested the deep learning architectures on the questions dataset in Turkish. In addition to this, we used three main deep learning approaches (Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN)) and we also applied two different deep learning combinations of CNN-GRU and CNN-LSTM architectures. Furthermore, we applied the Word2vec technique with both skip-gram and CBOW methods for word embedding with various vector sizes on a large corpus composed of user questions. By comparing analysis, we conducted an experiment on deep learning architectures based on test and 10-cross fold validation accuracy. Experiment results were obtained to illustrate the effectiveness of various Word2vec techniques that have a considerable impact on the accuracy rate using different deep learning approaches. We attained an accuracy of 93.7% by using these techniques on the question dataset.

Keywords: CBOW; Convolutional neural networks; Gated recurrent unit; Long short term memory; Machine learning; Question classification; Skip-gram; Turkish dataset; Word2vec.

Grants and funding

This work was supported by the Ministry of Higher Education Malaysia (MOHE) and Universiti Tun Hussein Onn Malaysia for funding this research activity under the Fundamental Research Grant Scheme (FRGS/1/2017/ICT02/UTHM/02/5), vote no. 1641. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.