Application of Topic Modeling to Tweets as the Foundation for Health Disparity Research for COVID-19

Stud Health Technol Inform. 2020 Jun 26:272:24-27. doi: 10.3233/SHTI200484.

Abstract

We randomly extracted publicly available Tweets mentioning COVID-19 related terms (n=2,558,474 Tweets) from Tweet corpora collected daily using an API from Jan 21st to May 3rd, 2020. We applied a clustering algorithm to publicly available Tweets authored by African Americans (n=1,763) to detect topics and sentiment applying natural language processing (NLP). We visualized fifteen topics (four themes) using network diagrams (Newman modularity 0.74). Compared to the COVID-19 related Tweets authored by others, positive sentiments, cohesively encouraging online discussions (e.g., Black strong 27.1%, growing up Blacks 22.8%, support Black business 17.0%, how to build resilience 7.8%), and COVID-19 prevention behaviors (e.g., masks 4.7%, encouraging social distancing 9.4%) were uniquely observed in African American Twitter communities. Application of topic modeling techniques to streaming social media Twitter provides the foundation for research team insights regarding information and future virtual based intervention and social media based health disparity research for COVID-19.

Keywords: health disparities; pandemic; social media; virtual intervention.

MeSH terms

  • Betacoronavirus*
  • COVID-19
  • Coronavirus Infections*
  • Humans
  • Pandemics*
  • Pneumonia, Viral*
  • SARS-CoV-2
  • Social Media