Adapting recurrent neural networks for classifying public discourse on COVID-19 symptoms in Twitter content

Samina Amin; Abdullah Alharbi; M Irfan Uddin; Hashem Alyami

doi:10.1007/s00500-022-07405-0

Adapting recurrent neural networks for classifying public discourse on COVID-19 symptoms in Twitter content

Soft comput. 2022;26(20):11077-11089. doi: 10.1007/s00500-022-07405-0. Epub 2022 Aug 10.

Authors

Samina Amin¹, Abdullah Alharbi², M Irfan Uddin¹, Hashem Alyami³

Affiliations

¹ Institute of Computing, Kohat University of Science and Technology, Kohat, 2600 Pakistan.
² Department of Information Technology, College of Computers and Information Technology, Taif University, P. O. Box 11099, Taif, 21944 Saudi Arabia.
³ Department of Computer Science, College of Computers and Information Technology, Taif University, P. O. Box 11099, Taif, 21944 Saudi Arabia.

Abstract

The COVID-19 infection, which began in December 2019, has claimed many lives and impacted all aspects of human life. With time, COVID-19 was identified as a pandemic outbreak by the World Health Organization (WHO), putting massive pressure on global health. During this ongoing pandemic, the exponential growth of social media platforms has provided valuable resources for distributing information, as well as a source for self-reported disease symptoms in public discourse. Therefore, there is an urgent need for effective approaches to detect self-reported symptoms or cases in social media content. In this study, we scrapped public discourse on COVID-19 symptoms in Twitter content. For this, we developed a huge dataset of COVID-19 self-reported symptoms and gold-annotated the tweets into four categories: confirmed, death, suspected, and recovered. Then, we use a machine and deep machine learning models, each with its own set of features, such as feature representation. Furthermore, the experimentations were achieved with recurrent neural networks (RNNs) variants and compared their performance with traditional machine learning algorithms. Experimental results report that optimizing the area under the curve (AUC) enhances model performance, and the long short-term memory (LSTM) has the highest accuracy in detecting COVID-19 symptoms in real-time public messaging. Thus, the LSTM classifier in the proposed pipeline achieves a classification accuracy of 90.7%, outperforming existing state-of-the-art algorithms for multi-class classification.

Keywords: COVID-19; Classification; Coronavirus; Deep learning; Pandemic; Recurrent neural networks; Twitter.

© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022, Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.