Using search trends to analyze web-based users' behavior profiles connected with COVID-19 in mainland China: infodemiology study based on hot words and Baidu Index

PeerJ. 2022 Nov 9:10:e14343. doi: 10.7717/peerj.14343. eCollection 2022.

Abstract

Background: Mainland China, the world's most populous region, experienced a large-scale coronavirus disease 2019 (COVID-19) outbreak in 2020 and 2021, respectively. Existing infodemiology studies have primarily concentrated on the prospective surveillance of confirmed cases or symptoms which met the criterion for investigators; nevertheless, the actual impact regarding COVID-19 on the public and subsequent attitudes of different groups towards the COVID-19 epidemic were neglected.

Methods: This study aimed to examine the public web-based search trends and behavior patterns related to COVID-19 outbreaks in mainland China by using hot words and Baidu Index (BI). The initial hot words (the high-frequency words on the Internet) and the epidemic data (2019/12/01-2021/11/30) were mined from infodemiology platforms. The final hot words table was established by two-rounds of hot words screening and double-level hot words classification. Temporal distribution and demographic portraits of COVID-19 were queried by search trends service supplied from BI to perform the correlation analysis. Further, we used the parameter estimation to quantitatively forecast the geographical distribution of COVID-19 in the future.

Results: The final English-Chinese bilingual table was established including six domains and 32 subordinate hot words. According to the temporal distribution of domains and subordinate hot words in 2020 and 2021, the peaks of searching subordinate hot words and COVID-19 outbreak periods had significant temporal correlation and the subordinate hot words in COVID-19 Related and Territory domains were reliable for COVID-19 surveillance. Gender distribution results showed that Territory domain (the male proportion: 67.69%; standard deviation (SD): 5.88%) and Symptoms/Symptom and Public Health (the female proportion: 57.95%, 56.61%; SD: 0, 9.06%) domains were searched more by male and female groups respectively. The results of age distribution of hot words showed that people aged 20-50 (middle-aged people) had a higher online search intensity, and the group of 20-29, 30-39 years old focused more on Media and Symptoms/Symptom (proportion: 45.43%, 51.66%; SD: 15.37%, 16.59%) domains respectively. Finally, based on frequency rankings of searching hot words and confirmed cases in Mainland China, the epidemic situation of provinces and Chinese administrative divisions were divided into 5 levels of early-warning regions. Central, East and South China regions would be impacted again by the COVID-19 in the future.

Keywords: Baidu index; Behavior profiles; COVID-19; Hot words; Mainland China.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • COVID-19* / epidemiology
  • China / epidemiology
  • Female
  • Humans
  • Infodemiology
  • Internet
  • Male
  • Middle Aged
  • Prospective Studies

Grants and funding

This research was supported by grants from the Key Research & Development Project of Nanhua Biomedical Co., Ltd (No H202191490139), the National Natural Science Foundation of China (No 31872866), and the China Postdoctoral Science Foundation (No 2021M701160). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.