Assessing the added value of linking electronic health records to improve the prediction of self-reported COVID-19 testing and diagnosis

PLoS One. 2022 Jul 25;17(7):e0269017. doi: 10.1371/journal.pone.0269017. eCollection 2022.

Abstract

Since the beginning of the Coronavirus Disease 2019 (COVID-19) pandemic, a focus of research has been to identify risk factors associated with COVID-19-related outcomes, such as testing and diagnosis, and use them to build prediction models. Existing studies have used data from digital surveys or electronic health records (EHRs), but very few have linked the two sources to build joint predictive models. In this study, we used survey data on 7,054 patients from the Michigan Genomics Initiative biorepository to evaluate how well self-reported data could be integrated with electronic records for the purpose of modeling COVID-19-related outcomes. We observed that among survey respondents, self-reported COVID-19 diagnosis captured a larger number of cases than the corresponding EHRs, suggesting that self-reported outcomes may be better than EHRs for distinguishing COVID-19 cases from controls. In the modeling context, we compared the utility of survey- and EHR-derived predictor variables in models of survey-reported COVID-19 testing and diagnosis. We found that survey-derived predictors produced uniformly stronger models than EHR-derived predictors-likely due to their specificity, temporal proximity, and breadth-and that combining predictors from both sources offered no consistent improvement compared to using survey-based predictors alone. Our results suggest that, even though general EHRs are useful in predictive models of COVID-19 outcomes, they may not be essential in those models when rich survey data are already available. The two data sources together may offer better prediction for COVID severity, but we did not have enough severe cases in the survey respondents to assess that hypothesis in in our study.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • COVID-19 Testing
  • COVID-19* / diagnosis
  • COVID-19* / epidemiology
  • Electronic Health Records*
  • Humans
  • Self Report
  • Surveys and Questionnaires

Grants and funding

The research presented here was funded by the National Science Foundation (https://www.nsf.gov/) under grant DMS 1712933 (BM), the National Institutes of Health (https://www.nih.gov) under grant 5R01HG008773-05 (BM) and 3P30CA046592-32-S3 (BM), and the Michigan Collaborative Addiction Resources & Education System (https://micaresed.org) under grant 1UG3CA267907-01 (BM). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.