Tracking U.S. Pertussis Incidence: Correlation of Public Health Surveillance and Google Search Data Varies by State

Sci Rep. 2019 Dec 24;9(1):19801. doi: 10.1038/s41598-019-56385-z.

Abstract

The Morbidity and Mortality Weekly Reports of the U.S. Centers for Disease Control and Prevention document a raw proxy for counts of pertussis cases in the U.S., and the Project Tycho (PT) database provides an improved source of these weekly data. These data are limited because of reporting delays, variation in state-level surveillance practices, and changes over time in diagnosis methods. We aim to assess whether Google Trends (GT) search data track pertussis incidence relative to PT data and if sociodemographic characteristics explain some variation in the accuracy of state-level models. GT and PT data were used to construct auto-correlation corrected linear models for pertussis incidence in 2004-2011 for the entire U.S. and each individual state. The national model resulted in a moderate correlation (adjusted R2 = 0.2369, p < 0.05), and state models tracked PT data for some but not all states. Sociodemographic variables explained approximately 30% of the variation in performance of individual state-level models. The significant correlation between GT models and public health data suggests that GT is a potentially useful pertussis surveillance tool. However, the variable accuracy of this tool by state suggests GT surveillance cannot be applied in a uniform manner across geographic sub-regions.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adolescent
  • Adult
  • Centers for Disease Control and Prevention, U.S.
  • Child
  • Child, Preschool
  • Geography
  • Humans
  • Incidence
  • Infant
  • Middle Aged
  • Morbidity
  • Public Health Informatics
  • Public Health Surveillance / methods*
  • Reproducibility of Results
  • Search Engine
  • Social Class
  • United States
  • Whooping Cough / epidemiology*
  • Young Adult