Estimating disease burden using Internet data

Riyi Qiu; Mirsad Hadzikadic; Sha Yu; Lixia Yao

doi:10.1177/1460458218810743

Estimating disease burden using Internet data

Health Informatics J. 2019 Dec;25(4):1863-1877. doi: 10.1177/1460458218810743. Epub 2018 Nov 29.

Authors

Riyi Qiu, Mirsad Hadzikadic, Sha Yu¹, Lixia Yao²

Affiliations

¹ The University of North Carolina at Charlotte, USA.
² The University of North Carolina at Charlotte, USA; Mayo Clinic, USA.

PMID: 30488754
DOI: 10.1177/1460458218810743

Abstract

Data on disease burden are often used for assessing population health, evaluating the effectiveness of interventions, formulating health policies, and planning future resource allocation. We investigated whether Internet usage and social media data, specifically the search volume on Google, page view count on Wikipedia, and disease mentioning frequency on Twitter, correlated with the disease burden, measured by prevalence and treatment cost, for 1633 diseases over an 11-year period. We also applied least absolute shrinkage and selection operator to predict the burden of diseases. We found that Google search volume is relatively strongly correlated with the burdens for 39 of 1633 diseases, including viral hepatitis, diabetes mellitus, multiple sclerosis, and hemorrhoids. Wikipedia and Twitter data strongly correlated with the burdens of 15 and 7 diseases, respectively. However, an accurate analysis must consider each condition's characteristics, including acute/chronic nature, severity, familiarity to the public, and the presence of stigma.

Keywords: Google search; Twitter; Wikipedia; data mining; disease burden; least absolute shrinkage and selection operator; prevalence; treatment cost.

MeSH terms

Cost of Illness*
Data Analysis
Electronic Data Processing / instrumentation*
Electronic Data Processing / methods
Electronic Data Processing / statistics & numerical data
Humans
Internet / statistics & numerical data
Social Media / classification*
Social Media / instrumentation
Social Media / statistics & numerical data