Improving sentiment classification using a RoBERTa-based hybrid model

Noura A Semary; Wesam Ahmed; Khalid Amin; Paweł Pławiak; Mohamed Hammad

doi:10.3389/fnhum.2023.1292010

Improving sentiment classification using a RoBERTa-based hybrid model

Front Hum Neurosci. 2023 Dec 7:17:1292010. doi: 10.3389/fnhum.2023.1292010. eCollection 2023.

Authors

Noura A Semary¹, Wesam Ahmed^{1

2}, Khalid Amin¹, Paweł Pławiak^{3

4}, Mohamed Hammad^{1

5}

Affiliations

¹ Department of Information Technology, Faculty of Computers and Information, Menoufia University, Shibin El Kom, Egypt.
² Department of Information Technology, Faculty of Computers and Artificial Intelligence, South Valley University, Hurghada, Egypt.
³ Department of Computer Science, Faculty of Computer Science and Telecommunications, Cracow University of Technology, Krakow, Poland.
⁴ Institute of Theoretical and Applied Informatics, Polish Academy of Sciences, Gliwice, Poland.
⁵ EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, Riyadh, Saudi Arabia.

Abstract

Introduction: Several attempts have been made to enhance text-based sentiment analysis's performance. The classifiers and word embedding models have been among the most prominent attempts. This work aims to develop a hybrid deep learning approach that combines the advantages of transformer models and sequence models with the elimination of sequence models' shortcomings.

Methods: In this paper, we present a hybrid model based on the transformer model and deep learning models to enhance sentiment classification process. Robustly optimized BERT (RoBERTa) was selected for the representative vectors of the input sentences and the Long Short-Term Memory (LSTM) model in conjunction with the Convolutional Neural Networks (CNN) model was used to improve the suggested model's ability to comprehend the semantics and context of each input sentence. We tested the proposed model with two datasets with different topics. The first dataset is a Twitter review of US airlines and the second is the IMDb movie reviews dataset. We propose using word embeddings in conjunction with the SMOTE technique to overcome the challenge of imbalanced classes of the Twitter dataset.

Results: With an accuracy of 96.28% on the IMDb reviews dataset and 94.2% on the Twitter reviews dataset, the hybrid model that has been suggested outperforms the standard methods.

Discussion: It is clear from these results that the proposed hybrid RoBERTa-(CNN+ LSTM) method is an effective model in sentiment classification.

Keywords: CNN+LSTM; LSTM; RoBERTa; SMOTE; sentiment analysis; word embedding.

Grants and funding

The author(s) that no declare financial support was received for the research, authorship, and/or publication of this article.