Trend clustering from COVID-19 tweets using graphical lasso-guided iterative principal component analysis

Sci Rep. 2022 Apr 5;12(1):5709. doi: 10.1038/s41598-022-09651-6.

Abstract

This article presents a method for trend clustering from tweets about coronavirus disease (COVID-19) to help us objectively review the past and make decisions about future countermeasures. We aim to avoid detecting usual trends based on seasonal events while detecting essential trends caused by the influence of COVID-19. To this aim, we regard daily changes in the frequencies of each word in tweets as time series signals and define time series signals with single peaks as target trends. To successfully cluster the target trends, we propose graphical lasso-guided iterative principal component analysis (GLIPCA). GLIPCA enables us to remove trends with indirect correlations generated by other essential trends. Moreover, GLIPCA overcomes the difficulty in the quantitative evaluation of the accuracy of trend clustering. Thus, GLIPCA's parameters are easier to determine than those of other clustering methods. We conducted experiments using Japanese tweets about COVID-19 from March 8, 2020, to May 7, 2020. The results show that GLIPCA successfully distinguished trends before and after the declaration of a state of emergency on April 7, 2020. In addition, the results reveal the international argument about whether the Tokyo 2020 Summer Olympics should be held. The results suggest the tremendous social impact of the words and actions of Japanese celebrities. Furthermore, the results suggest that people's attention moved from worry and fear of an unknown novel pneumonia to the need for medical care and a new lifestyle as well as the scientific characteristics of COVID-19.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • COVID-19* / epidemiology
  • Cluster Analysis
  • Humans
  • Principal Component Analysis
  • SARS-CoV-2
  • Social Media*