Bias correction models for electronic health records data in the presence of non-random sampling

Jiyu Kim; Rebecca Anthopolos; Judy Zhong

doi:10.1093/biomtc/ujae014

Bias correction models for electronic health records data in the presence of non-random sampling

Biometrics. 2024 Jan 29;80(1):ujae014. doi: 10.1093/biomtc/ujae014.

Authors

Jiyu Kim¹, Rebecca Anthopolos¹, Judy Zhong¹

Affiliation

¹ Department of Population Health, NYU Grossman School of Medicine, New York University, 180 Madison Ave, New York, NY 10016, United States.

PMID: 38488466
PMCID: PMC10941326 (available on 2025-03-15)
DOI: 10.1093/biomtc/ujae014

Abstract

Electronic health records (EHRs) contain rich clinical information for millions of patients and are increasingly used for public health research. However, non-random inclusion of subjects in EHRs can result in selection bias, with factors such as demographics, socioeconomic status, healthcare referral patterns, and underlying health status playing a role. While this issue has been well documented, little work has been done to develop or apply bias-correction methods, often due to the fact that most of these factors are unavailable in EHRs. To address this gap, we propose a series of Heckman type bias correction methods by incorporating social determinants of health selection covariates to model the EHR non-random sampling probability. Through simulations under various settings, we demonstrate the effectiveness of our proposed method in correcting biases in both the association coefficient and the outcome mean. Our method augments the utility of EHRs for public health inferences, as we show by estimating the prevalence of cardiovascular disease and its correlation with risk factors in the New York City network of EHRs.

Keywords: EHRs; SNAR; bias correction; social determinants of health.

MeSH terms

Bias
Electronic Health Records*
Health Status*
Humans
Risk Factors
Selection Bias

Abstract

MeSH terms

Grants and funding