Tuberculosis surveillance by analyzing Google trends

IEEE Trans Biomed Eng. 2011 Aug;58(8). doi: 10.1109/TBME.2011.2132132. Epub 2011 Mar 24.

Abstract

Tuberculosis (TB) is a major global health concern, causing nearly ten million new cases and over one million deaths every year. The early detection of possible epidemic is the first and important defense line against tuberculosis. However, traditional surveillance approaches, e.g. US Centers for Disease Control and Prevention (CDC), publish the TB morbidity surveillance results on a quarterly basis, with months of reporting lag. Moreover, in some developing countries, where most infections occur, there may not be enough medical resources to build traditional surveillance systems. To improve early detection of tuberculosis outbreaks, we developed a syndromic approach to estimate the actual number of TB cases using Google Search Volume. Specifically, the search volume of nineteen TB-related terms, obtained from January 2004 to April 2009, were examined for surveillance purpose. Contemporary TB surveillance data were extracted from the CDCs reports to build and evaluate the syndromic system. We estimate the actual TB occurrences using a non-stationary dynamic system. Respective models are built to monitor both national-level and state-level TB activities. The surveillance results of the syndromic system can be updated every day, which is twelve weeks ahead of CDCs reports.

MeSH terms

  • Algorithms
  • Computer Simulation
  • Data Interpretation, Statistical
  • Data Mining / statistics & numerical data*
  • Disease Outbreaks / statistics & numerical data*
  • Humans
  • Incidence
  • Internet / statistics & numerical data*
  • Models, Statistical*
  • Population Surveillance / methods*
  • Proportional Hazards Models*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Tuberculosis / epidemiology*