Predictive Modeling of Vaccination Uptake in US Counties: A Machine Learning-Based Approach

J Med Internet Res. 2021 Nov 25;23(11):e33231. doi: 10.2196/33231.

Abstract

Background: Although the COVID-19 pandemic has left an unprecedented impact worldwide, countries such as the United States have reported the most substantial incidence of COVID-19 cases worldwide. Within the United States, various sociodemographic factors have played a role in the creation of regional disparities. Regional disparities have resulted in the unequal spread of disease between US counties, underscoring the need for efficient and accurate predictive modeling strategies to inform public health officials and reduce the burden on health care systems. Furthermore, despite the widespread accessibility of COVID-19 vaccines across the United States, vaccination rates have become stagnant, necessitating predictive modeling to identify important factors impacting vaccination uptake.

Objective: This study aims to determine the association between sociodemographic factors and vaccine uptake across counties in the United States.

Methods: Sociodemographic data on fully vaccinated and unvaccinated individuals were sourced from several online databases such as the US Centers for Disease Control and Prevention and the US Census Bureau COVID-19 Site. Machine learning analysis was performed using XGBoost and sociodemographic data.

Results: Our model predicted COVID-19 vaccination uptake across US counties with 62% accuracy. In addition, it identified location, education, ethnicity, income, and household access to the internet as the most critical sociodemographic features in predicting vaccination uptake in US counties. Lastly, the model produced a choropleth demonstrating areas of low and high vaccination rates, which can be used by health care authorities in future pandemics to visualize and prioritize areas of low vaccination and design targeted vaccination campaigns.

Conclusions: Our study reveals that sociodemographic characteristics are predictors of vaccine uptake rates across counties in the United States and, if leveraged appropriately, can assist policy makers and public health officials to understand vaccine uptake rates and craft policies to improve them.

Keywords: COVID-19; SARS-CoV-2; United States; XGBoost; machine learning; model; prediction; public health; sociodemographic; sociodemographic factors; uptake; vaccine.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • COVID-19 Vaccines*
  • COVID-19*
  • Humans
  • Machine Learning
  • Pandemics
  • SARS-CoV-2
  • United States
  • Vaccination

Substances

  • COVID-19 Vaccines