Dynamic topic modeling of twitter data during the COVID-19 pandemic

PLoS One. 2022 May 27;17(5):e0268669. doi: 10.1371/journal.pone.0268669. eCollection 2022.

Abstract

In an effort to gauge the global pandemic's impact on social thoughts and behavior, it is important to answer the following questions: (1) What kinds of topics are individuals and groups vocalizing in relation to the pandemic? (2) Are there any noticeable topic trends and if so how do these topics change over time and in response to major events? In this paper, through the advanced Sequential Latent Dirichlet Allocation model, we identified twelve of the most popular topics present in a Twitter dataset collected over the period spanning April 3rd to April 13th, 2020 in the United States and discussed their growth and changes over time. These topics were both robust, in that they covered specific domains, not simply events, and dynamic, in that they were able to change over time in response to rising trends in our dataset. They spanned politics, healthcare, community, and the economy, and experienced macro-level growth over time, while also exhibiting micro-level changes in topic composition. Our approach differentiated itself in both scale and scope to study the emerging topics concerning COVID-19 at a scale that few works have been able to achieve. We contributed to the cross-sectional field of urban studies and big data. Whereas we are optimistic towards the future, we also understand that this is an unprecedented time that will have lasting impacts on individuals and society at large, impacting not only the economy or geo-politics, but human behavior and psychology. Therefore, in more ways than one, this research is just beginning to scratch the surface of what will be a concerted research effort into studying the history and repercussions of COVID-19.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • COVID-19* / epidemiology
  • Cross-Sectional Studies
  • Humans
  • Pandemics
  • Politics
  • Social Media*

Grants and funding

This work was sponsored by NYU Shanghai Laboratory of Urban Design and Science (LOUD); the Zaanheh Project and Center for Data Science and Artificial Intelligence at New York University (Shanghai); NYU Shanghai Major-Grants Seed Fund (Grant No. 2022CHGuan_MGSF; sponsored by the PEAK Urban programme, supported by UKRI’s Global Challenge Research Fund, Grant Ref: ES/P011055/1; Fujian Urban Investment and Technology Institute’s Research Fund (Grant No. 20210201 FJCT). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.