Weekly dengue forecasts in Iquitos, Peru; San Juan, Puerto Rico; and Singapore

PLoS Negl Trop Dis. 2020 Oct 16;14(10):e0008710. doi: 10.1371/journal.pntd.0008710. eCollection 2020 Oct.

Abstract

Background: Predictive models can serve as early warning systems and can be used to forecast future risk of various infectious diseases. Conventionally, regression and time series models are used to forecast dengue incidence, using dengue surveillance (e.g., case counts) and weather data. However, these models may be limited in terms of model assumptions and the number of predictors that can be included. Machine learning (ML) methods are designed to work with a large number of predictors and thus offer an appealing alternative. Here, we compared the performance of ML algorithms with that of regression models in predicting dengue cases and outbreaks from 4 to up to 12 weeks in advance. Many countries lack sufficient health surveillance infrastructure, as such we evaluated the contribution of dengue surveillance and weather data on the predictive power of these models.

Methods: We developed ML, regression, and time series models to forecast weekly dengue case counts and outbreaks in Iquitos, Peru; San Juan, Puerto Rico; and Singapore from 1990-2016. Forecasts were generated using available weekly dengue surveillance, and weather data. We evaluated the agreement between model forecasts and actual dengue observations using Mean Absolute Error and Matthew's Correlation Coefficient (MCC).

Results: For near term predictions of weekly case counts and when using surveillance data, ML models had 21% and 33% less error than regression and time series models respectively. However, using weather data only, ML models did not demonstrate a practical advantage. When forecasting weekly dengue outbreaks 12 weeks in advance, ML models achieved a maximum MCC of 0.61.

Conclusions: Our results identified 2 scenarios when ML models are advantageous over regression model: 1) predicting dengue weekly case counts 4 weeks ahead when dengue surveillance data are available and 2) predicting weekly dengue outbreaks 12 weeks ahead when dengue surveillance data are unavailable. Given the advantages of ML models, dengue early warning systems may be improved by the inclusion of these models.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Dengue / epidemiology*
  • Disease Outbreaks
  • Forecasting
  • Humans
  • Models, Biological
  • Peru / epidemiology
  • Population Surveillance
  • Puerto Rico / epidemiology
  • Singapore / epidemiology
  • Time Factors
  • Weather

Grants and funding

This research was supported by internal funding provided by the Charles Stark Draper Laboratory Inc. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.