Assessing the added value of linking electronic health records to improve the prediction of self-reported COVID-19 testing and diagnosis

Dylan Clark-Boucher; Jonathan Boss; Maxwell Salvatore; Jennifer A Smith; Lars G Fritsche; Bhramar Mukherjee

doi:10.1371/journal.pone.0269017

Assessing the added value of linking electronic health records to improve the prediction of self-reported COVID-19 testing and diagnosis

PLoS One. 2022 Jul 25;17(7):e0269017. doi: 10.1371/journal.pone.0269017. eCollection 2022.

Authors

Dylan Clark-Boucher¹, Jonathan Boss¹, Maxwell Salvatore^{1

2}, Jennifer A Smith^{2

3}, Lars G Fritsche^{1

4

5}, Bhramar Mukherjee^{1

2

4

5}

Affiliations

¹ Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America.
² Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America.
³ Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, Michigan, United States of America.
⁴ Rogel Cancer Center, University of Michigan, Ann Arbor, Michigan, United States of America.
⁵ Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America.

Abstract

Since the beginning of the Coronavirus Disease 2019 (COVID-19) pandemic, a focus of research has been to identify risk factors associated with COVID-19-related outcomes, such as testing and diagnosis, and use them to build prediction models. Existing studies have used data from digital surveys or electronic health records (EHRs), but very few have linked the two sources to build joint predictive models. In this study, we used survey data on 7,054 patients from the Michigan Genomics Initiative biorepository to evaluate how well self-reported data could be integrated with electronic records for the purpose of modeling COVID-19-related outcomes. We observed that among survey respondents, self-reported COVID-19 diagnosis captured a larger number of cases than the corresponding EHRs, suggesting that self-reported outcomes may be better than EHRs for distinguishing COVID-19 cases from controls. In the modeling context, we compared the utility of survey- and EHR-derived predictor variables in models of survey-reported COVID-19 testing and diagnosis. We found that survey-derived predictors produced uniformly stronger models than EHR-derived predictors-likely due to their specificity, temporal proximity, and breadth-and that combining predictors from both sources offered no consistent improvement compared to using survey-based predictors alone. Our results suggest that, even though general EHRs are useful in predictive models of COVID-19 outcomes, they may not be essential in those models when rich survey data are already available. The two data sources together may offer better prediction for COVID severity, but we did not have enough severe cases in the survey respondents to assess that hypothesis in in our study.

Publication types

Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

COVID-19 Testing
COVID-19* / diagnosis
COVID-19* / epidemiology
Electronic Health Records*
Humans
Self Report
Surveys and Questionnaires

Grants and funding

The research presented here was funded by the National Science Foundation (https://www.nsf.gov/) under grant DMS 1712933 (BM), the National Institutes of Health (https://www.nih.gov) under grant 5R01HG008773-05 (BM) and 3P30CA046592-32-S3 (BM), and the Michigan Collaborative Addiction Resources & Education System (https://micaresed.org) under grant 1UG3CA267907-01 (BM). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.