Who was at risk for COVID-19 late in the US pandemic? Insights from a population health machine learning model

Elijah A Adeoye; Yelena Rozenfeld; Jennifer Beam; Karen Boudreau; Emily J Cox; James M Scanlan

doi:10.1007/s11517-022-02549-5

Who was at risk for COVID-19 late in the US pandemic? Insights from a population health machine learning model

Med Biol Eng Comput. 2022 Jul;60(7):2039-2049. doi: 10.1007/s11517-022-02549-5. Epub 2022 May 11.

Authors

Elijah A Adeoye¹, Yelena Rozenfeld², Jennifer Beam¹, Karen Boudreau¹, Emily J Cox³, James M Scanlan⁴

Affiliations

¹ Providence St. Joseph Health, 1801 Lind Avenue S.W. Valley Office Park, Morin Bldg, 1st Floor, Renton, WA, 98057-9016, USA.
² Providence St. Joseph Health, 1801 Lind Avenue S.W. Valley Office Park, Morin Bldg, 1st Floor, Renton, WA, 98057-9016, USA. Yelena.Rozenfeld@providence.org.
³ Providence Medical Research Center, 105 W 8th Ave, Suite 250E, Spokane, WA, 99204, USA.
⁴ Swedish Center for Research and Innovation, 800 Fifth Ave, 11th floor, Seattle, WA, USA.

Abstract

Notable discrepancies in vulnerability to COVID-19 infection have been identified between specific population groups and regions in the USA. The purpose of this study was to estimate the likelihood of COVID-19 infection using a machine-learning algorithm that can be updated continuously based on health care data. Patient records were extracted for all COVID-19 nasal swab PCR tests performed within the Providence St. Joseph Health system from February to October of 2020. A total of 316,599 participants were included in this study, and approximately 7.7% (n = 24,358) tested positive for COVID-19. A gradient boosting model, LightGBM (LGBM), predicted risk of initial infection with an area under the receiver operating characteristic curve of 0.819. Factors that predicted infection were cough, fever, being a member of the Hispanic or Latino community, being Spanish speaking, having a history of diabetes or dementia, and living in a neighborhood with housing insecurity. A model trained on sociodemographic, environmental, and medical history data performed well in predicting risk of a positive COVID-19 test. This model could be used to tailor education, public health policy, and resources for communities that are at the greatest risk of infection.

Keywords: COVID-19; Infection; Risk; Social determinants of health.

MeSH terms

COVID-19* / epidemiology
Humans
Machine Learning
Pandemics
Population Health*
SARS-CoV-2