A study of generalizability of recurrent neural network-based predictive models for heart failure onset risk using a large and heterogeneous EHR data set

Laila Rasmy; Yonghui Wu; Ningtao Wang; Xin Geng; W Jim Zheng; Fei Wang; Hulin Wu; Hua Xu; Degui Zhi

doi:10.1016/j.jbi.2018.06.011

A study of generalizability of recurrent neural network-based predictive models for heart failure onset risk using a large and heterogeneous EHR data set

J Biomed Inform. 2018 Aug:84:11-16. doi: 10.1016/j.jbi.2018.06.011. Epub 2018 Jun 15.

Authors

Laila Rasmy¹, Yonghui Wu², Ningtao Wang³, Xin Geng⁴, W Jim Zheng¹, Fei Wang⁵, Hulin Wu³, Hua Xu¹, Degui Zhi⁶

Affiliations

¹ School of Biomedical Informatics, University of Texas Health Science Center at Houston (UTHealth), Houston, TX, United States.
² Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States.
³ Department of Biostatistics and Data Science, School of Public Health, University of Texas Health Science Center at Houston (UTHealth), Houston, TX, United States.
⁴ BGI-Shenzhen, Shenzhen, 518083, China.
⁵ Department of Healthcare Policy and Research, Weill Cornell Medicine, Cornell University, New York, NY, United States.
⁶ School of Biomedical Informatics, University of Texas Health Science Center at Houston (UTHealth), Houston, TX, United States. Electronic address: Degui.Zhi@uth.tmc.edu.

Abstract

Recently, recurrent neural networks (RNNs) have been applied in predicting disease onset risks with Electronic Health Record (EHR) data. While these models demonstrated promising results on relatively small data sets, the generalizability and transferability of those models and its applicability to different patient populations across hospitals have not been evaluated. In this study, we evaluated an RNN model, RETAIN, over Cerner Health Facts® EMR data, for heart failure onset risk prediction. Our data set included over 150,000 heart failure patients and over 1,000,000 controls from nearly 400 hospitals. Convincingly, RETAIN achieved an AUC of 82% in comparison to an AUC of 79% for logistic regression, demonstrating the power of more expressive deep learning models for EHR predictive modeling. The prediction performance fluctuated across different patient groups and varied from hospital to hospital. Also, we trained RETAIN models on individual hospitals and found that the model can be applied to other hospitals with only about 3.6% of reduction of AUC. Our results demonstrated the capability of RNN for predictive modeling with large and heterogeneous EHR data, and pave the road for future improvements.

Keywords: Deep learning; EHR; Predictive modeling; RNN.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Aged
Aged, 80 and over
Algorithms
Area Under Curve
Case-Control Studies
Computer Simulation
Databases, Factual
Deep Learning*
Electronic Health Records*
Female
Heart Failure / diagnosis*
Humans
Logistic Models
Male
Medical Informatics / methods
Middle Aged
Neural Networks, Computer*
Reproducibility of Results

Grants and funding

R01 HG008115/HG/NHGRI NIH HHS/United States