An Infoveillance System for Detecting and Tracking Relevant Topics From Italian Tweets During the COVID-19 Event

IEEE Access. 2020 Jul 17:8:132527-132538. doi: 10.1109/ACCESS.2020.3010033. eCollection 2020.

Abstract

The year 2020 opened with a dramatic epidemic caused by a new species of coronavirus that soon has been declared a pandemic by the WHO due to the high number of deaths and the critical mass of worldwide hospitalized patients, of order of millions. The COVID-19 pandemic has forced the governments of hundreds of countries to apply several heavy restrictions in the citizens' socio-economic life. Italy was one of the most affected countries with long-term restrictions, impacting the socio-economic tissue. During this lockdown period, people got informed mostly on Online Social Media, where a heated debate followed all main ongoing events. In this scenario, the following study presents an in-depth analysis of the main emergent topics discussed during the lockdown phase within the Italian Twitter community. The analysis has been conducted through a general purpose methodological framework, grounded on a biological metaphor and on a chain of NLP and graph analysis techniques, in charge of detecting and tracking emerging topics in Online Social Media, e.g. streams of Twitter data. A term-frequency analysis in subsequent time slots is pipelined with nutrition and energy metrics for computing hot terms by also exploiting the tweets quality information, such as the social influence of the users. Finally, a co-occurrence analysis is adopted for building a topic graph where emerging topics are suitably selected. We demonstrate via a careful parameter setting the effectiveness of the topic tracking system, tailored to the current Twitter standard API restrictions, in capturing the main sociopolitical events that occurred during this dramatic phase.

Keywords: COVID-19; Natural language processing; infodemiology; infoveillance; social network analysis; text mining; topic detection; topic tracking.

Grants and funding

This work was supported in part by the Sapienza Research Calls project “PARADISE-PARAllel and DIStributed Evolutionary agent-based systems for machine learning and big data mining”, 2018.