Adapting recurrent neural networks for classifying public discourse on COVID-19 symptoms in Twitter content

Soft comput. 2022;26(20):11077-11089. doi: 10.1007/s00500-022-07405-0. Epub 2022 Aug 10.

Abstract

The COVID-19 infection, which began in December 2019, has claimed many lives and impacted all aspects of human life. With time, COVID-19 was identified as a pandemic outbreak by the World Health Organization (WHO), putting massive pressure on global health. During this ongoing pandemic, the exponential growth of social media platforms has provided valuable resources for distributing information, as well as a source for self-reported disease symptoms in public discourse. Therefore, there is an urgent need for effective approaches to detect self-reported symptoms or cases in social media content. In this study, we scrapped public discourse on COVID-19 symptoms in Twitter content. For this, we developed a huge dataset of COVID-19 self-reported symptoms and gold-annotated the tweets into four categories: confirmed, death, suspected, and recovered. Then, we use a machine and deep machine learning models, each with its own set of features, such as feature representation. Furthermore, the experimentations were achieved with recurrent neural networks (RNNs) variants and compared their performance with traditional machine learning algorithms. Experimental results report that optimizing the area under the curve (AUC) enhances model performance, and the long short-term memory (LSTM) has the highest accuracy in detecting COVID-19 symptoms in real-time public messaging. Thus, the LSTM classifier in the proposed pipeline achieves a classification accuracy of 90.7%, outperforming existing state-of-the-art algorithms for multi-class classification.

Keywords: COVID-19; Classification; Coronavirus; Deep learning; Pandemic; Recurrent neural networks; Twitter.