A large-scale sentiment analysis of tweets pertaining to the 2020 US presidential election

J Big Data. 2022;9(1):79. doi: 10.1186/s40537-022-00633-z. Epub 2022 Jun 16.

Abstract

We capture the public sentiment towards candidates in the 2020 US Presidential Elections, by analyzing 7.6 million tweets sent out between October 31st and November 9th, 2020. We apply a novel approach to first identify tweets and user accounts in our database that were later deleted or suspended from Twitter. This approach allows us to observe the sentiment held for each presidential candidate across various groups of users and tweets: accessible tweets and accounts, deleted tweets and accounts, and suspended or inaccessible tweets and accounts. We compare the sentiment scores calculated for these groups and provide key insights into the differences. Most notably, we show that deleted tweets, posted after the Election Day, were more favorable to Joe Biden, and the ones posted leading to the Election Day, were more positive about Donald Trump. Also, the older a Twitter account was, the more positive tweets it would post about Joe Biden. The aim of this study is to highlight the importance of conducting sentiment analysis on all posts captured in real time, including those that are now inaccessible, in determining the true sentiments of the opinions around the time of an event.

Keywords: Natural Language Processing; Sentiment analysis; Twitter analysis; US Elections 2020.