Forecasting the West Nile Virus in the United States: An Extensive Novel Data Streams-Based Time Series Analysis and Structural Equation Modeling of Related Digital Searching Behavior

JMIR Public Health Surveill. 2019 Feb 28;5(1):e9176. doi: 10.2196/publichealth.9176.

Abstract

Background: West Nile virus is an arbovirus responsible for an infection that tends to peak during the late summer and early fall. Tools monitoring Web searches are emerging as powerful sources of data, especially concerning infectious diseases such as West Nile virus.

Objective: This study aimed at exploring the potential predictive power of West Nile virus-related Web searches.

Methods: Different novel data streams, including Google Trends, WikiTrends, YouTube, and Google News, were used to extract search trends. Data regarding West Nile virus cases were obtained from the Centers for Disease Control and Prevention. Data were analyzed using regression, times series analysis, structural equation modeling, and clustering analysis.

Results: In the regression analysis, an association between Web searches and "real-world" epidemiological figures was found. The best seasonal autoregressive integrated moving average model with explicative variable (SARIMAX) was found to be (0,1,1)x(0,1,1)4. Using data from 2004 to 2015, we were able to predict data for 2016. From the structural equation modeling, the consumption of West Nile virus-related news fully mediated the relation between Google Trends and the consumption of YouTube videos, as well as the relation between the latter variable and the number of West Nile virus cases. Web searches fully mediated the relation between epidemiological figures and the consumption of YouTube videos, as well as the relation between epidemiological data and the number of accesses to the West Nile virus-related Wikipedia page. In the clustering analysis, the consumption of news was most similar to the Web searches pattern, which was less close to the consumption of YouTube videos and least similar to the behavior of accessing West Nile virus-related Wikipedia pages.

Conclusions: Our study demonstrated an association between epidemiological data and search patterns related to the West Nile virus. Based on this correlation, further studies are needed to examine the practicality of these findings.

Keywords: Google Trends; West Nile virus; forecasting model; infodemiology; infoveillance; seasonal autoregressive integrated moving average model with explicative variable (SARIMAX).