Improved Real-Time Influenza Surveillance: Using Internet Search Data in Eight Latin American Countries

Leonardo Clemente; Fred Lu; Mauricio Santillana

doi:10.2196/12214

Improved Real-Time Influenza Surveillance: Using Internet Search Data in Eight Latin American Countries

JMIR Public Health Surveill. 2019 Apr 4;5(2):e12214. doi: 10.2196/12214.

Authors

Leonardo Clemente^#^{1

2}, Fred Lu^#², Mauricio Santillana^{2

3}

Affiliations

¹ School of Engineering and Sciences, Tecnologico de Monterrey, Monterrey, Mexico.
² Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States.
³ Department of Pediatrics, Harvard Medical School, Boston, MA, United States.

^# Contributed equally.

PMID: 30946017
PMCID: PMC6470460
DOI: 10.2196/12214

Abstract

Background: Novel influenza surveillance systems that leverage Internet-based real-time data sources including Internet search frequencies, social-network information, and crowd-sourced flu surveillance tools have shown improved accuracy over the past few years in data-rich countries like the United States. These systems not only track flu activity accurately, but they also report flu estimates a week or more ahead of the publication of reports produced by healthcare-based systems, such as those implemented and managed by the Centers for Disease Control and Prevention. Previous work has shown that the predictive capabilities of novel flu surveillance systems, like Google Flu Trends (GFT), in developing countries in Latin America have not yet delivered acceptable flu estimates.

Objective: The aim of this study was to show that recent methodological improvements on the use of Internet search engine information to track diseases can lead to improved retrospective flu estimates in multiple countries in Latin America.

Methods: A machine learning-based methodology that uses flu-related Internet search activity and historical information to monitor flu activity, named ARGO (AutoRegression with Google search), was extended to generate flu predictions for 8 Latin American countries (Argentina, Bolivia, Brazil, Chile, Mexico, Paraguay, Peru, and Uruguay) for the time period: January 2012 to December of 2016. These retrospective (out-of-sample) Influenza activity predictions were compared with historically observed flu suspected cases in each country, as reported by Flunet, an influenza surveillance database maintained by the World Health Organization. For a baseline comparison, retrospective (out-of-sample) flu estimates were produced for the same time period using autoregressive models that only leverage historical flu activity information.

Results: Our results show that ARGO-like models' predictive power outperform autoregressive models in 6 out of 8 countries in the 2012-2016 time period. Moreover, ARGO significantly improves on historical flu estimates produced by the now discontinued GFT for the time period of 2012-2015, where GFT information is publicly available.

Conclusions: We demonstrate here that a self-correcting machine learning method, leveraging Internet-based disease-related search activity and historical flu trends, has the potential to produce reliable and timely flu estimates in multiple Latin American countries. This methodology may prove helpful to local public health officials who design and implement interventions aimed at mitigating the effects of influenza outbreaks. Our methodology generally outperforms both the now-discontinued tool GFT, and autoregressive methodologies that exploit only historical flu activity to produce future disease estimates.

Keywords: developing countries; digital epidemiology; google flu trends; influenza monitoring; influenza, human; machine learning; real-time disease surveillance.

©Leonardo Clemente, Fred Lu, Mauricio Santillana. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 04.04.2019.

Grants and funding

R01 GM130668/GM/NIGMS NIH HHS/United States