Forecasting tuberculosis using diabetes-related google trends data

Pathog Glob Health. 2020 Jul;114(5):236-241. doi: 10.1080/20477724.2020.1767854. Epub 2020 May 26.

Abstract

Online activity-based data can be used to aid infectious disease forecasting. Our aim was to exploit the converging nature of the tuberculosis (TB) and diabetes epidemics to forecast TB case numbers. Thus, we extended TB prediction models based on traditional data with diabetes-related Google searches. We obtained data on the weekly case numbers of TB in Germany from June 8th, 2014, to May 5th, 2019. Internet search data were obtained from a Google Trends (GTD) search for 'diabetes' to the corresponding interval. A seasonal autoregressive moving average (SARIMA) model (0,1,1) (1,0,0) [52] was selected to describe the weekly TB case numbers with and without GTD as an external regressor. We cross-validated the SARIMA models to obtain the root mean squared errors (RMSE). We repeated this procedure with autoregressive feed-forward neural network (NNAR) models using 5-fold cross-validation. To simulate a data-poor surveillance setting, we also tested traditional and GTD-extended models against a hold-out dataset using a decreased 52-week-long period with missing values for training. Cross-validation resulted in an RMSE of 20.83 for the traditional model and 18.56 for the GTD-extended model. Cross-validation of the NNAR models showed a mean RMSE of 19.49 for the traditional model and 18.99 for the GTD-extended model. When we tested the models trained on a decreased dataset with missing values, the GTD-extended models achieved significantly better prediction than the traditional models (p < 0.001). The GTD-extended models outperformed the traditional models in all assessed model evaluation parameters. Using online activity-based data regarding diabetes can improve TB forecasting, but further validation is warranted.

Keywords: Diabetes; Forecasting; Google Trends; Surveillance; Tuberculosis.

Publication types

  • Evaluation Study

MeSH terms

  • Diabetes Mellitus / epidemiology*
  • Epidemics*
  • Epidemiological Monitoring
  • Forecasting
  • Germany / epidemiology
  • Humans
  • Machine Learning
  • Neural Networks, Computer*
  • Tuberculosis / epidemiology*

Grants and funding

This research did not receive any specific grant from funding agencies including public, commercial, or not-for-profit sectors.