Factors associated with resistance to SARS-CoV-2 infection discovered using large-scale medical record data and machine learning

PLoS One. 2023 Feb 22;18(2):e0278466. doi: 10.1371/journal.pone.0278466. eCollection 2023.

Abstract

There have been over 621 million cases of COVID-19 worldwide with over 6.5 million deaths. Despite the high secondary attack rate of COVID-19 in shared households, some exposed individuals do not contract the virus. In addition, little is known about whether the occurrence of COVID-19 resistance differs among people by health characteristics as stored in the electronic health records (EHR). In this retrospective analysis, we develop a statistical model to predict COVID-19 resistance in 8,536 individuals with prior COVID-19 exposure using demographics, diagnostic codes, outpatient medication orders, and count of Elixhauser comorbidities in EHR data from the COVID-19 Precision Medicine Platform Registry. Cluster analyses identified 5 patterns of diagnostic codes that distinguished resistant from non-resistant patients in our study population. In addition, our models showed modest performance in predicting COVID-19 resistance (best performing model AUROC = 0.61). Monte Carlo simulations conducted indicated that the AUROC results are statistically significant (p < 0.001) for the testing set. We hope to validate the features found to be associated with resistance/non-resistance through more advanced association studies.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • COVID-19*
  • Electronic Health Records
  • Humans
  • Machine Learning
  • Retrospective Studies
  • SARS-CoV-2

Grants and funding

The data utilized were part of JH-CROWN: The COVID PMAP Registry, which is based on the contribution of many patients and clinicians and is funded by Hopkins inHealth, the Johns Hopkins Precision Medicine Program. Project-specific costs of data extraction were defrayed by funds from the Office of the Dean, JHU School of Medicine. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.