Sentiment Analysis and Comprehensive Evaluation of Supervised Machine Learning Models Using Twitter Data on Russia-Ukraine War

SN Comput Sci. 2023;4(4):346. doi: 10.1007/s42979-023-01790-5. Epub 2023 Apr 21.

Abstract

The Russia-Ukrainian War refers to the ongoing hostilities between Russia and Ukraine. It was first focused on whether Crimea and the Donbass were formally recognised as being a part of Ukraine when Russia started it in February 2014. The conflict dramatically grew when Russia began its incursion of Ukraine on February 24, 2022, following a military build-up on the Russian-Ukrainian border that started in late 2021. Examining public perceptions of the crisis between Russia and Ukraine is the goal of this piece. These days, social media has taken on a significant role in communication, and as a result, opinions can be found on platforms like Facebook, Twitter, and Instagram. The study makes use of his 11,250 tweets about the war between Russia and Ukraine from his Twitter account. Techniques, including image processing, object identification, and natural language processing, have shown application, power, and potential for machine learning. The same applies to text analytics. For text analysis, sentiment analysis, and entity annotation, machine learning techniques are frequently employed. According to the applicability and efficacy of the machine learning model, natural language processing toolkit in python is utilised in to examine the textual polarity and subjectivity score of tweets. Moreover, because machine learning models have a high degree of classification accuracy, they have been widely utilised to categorise emotions. We have developed and test models using three feature extraction techniques: TF-IDF (term frequency-inverse document frequency), BoW (bag of words), and N-gram. Each model was assessed using a number of important performance indicators, including accuracy, precision, recall, and F1 score. Results show that the extra trees classifier (ETC) model achieves a highest accuracy of 0.84 in combination with the Bow property which is a measure to evaluate the efficacy of a machine learning algorithm. Logistic regression (LR), decision tree (DT), support vector machine (SVM), XGB, Gaussian naive base (GNB), ADA, and K-nearest neighbours (KNN) comparison have also been made.

Keywords: Feature engineering; Machine learning; Sentiment analysis; Supervised machine learning models; Text classification.