A deep attention model to forecast the Length Of Stay and the in-hospital mortality right on admission from ICD codes and demographic data

Gaspard Harerimana; Jong Wook Kim; Beakcheol Jang

doi:10.1016/j.jbi.2021.103778

A deep attention model to forecast the Length Of Stay and the in-hospital mortality right on admission from ICD codes and demographic data

J Biomed Inform. 2021 Jun:118:103778. doi: 10.1016/j.jbi.2021.103778. Epub 2021 Apr 17.

Authors

Gaspard Harerimana¹, Jong Wook Kim², Beakcheol Jang³

Affiliations

¹ Department of Computer Science, Sangmyung University, Seoul, Republic of Korea. Electronic address: gharelim@alumni.cmu.edu.
² Department of Computer Science, Sangmyung University, Seoul, Republic of Korea. Electronic address: jkim@smu.ac.kr.
³ Graduate School of Information, Yonsei University, Seoul, Republic of Korea. Electronic address: bjang@yonsei.ac.kr.

PMID: 33872817
DOI: 10.1016/j.jbi.2021.103778

Abstract

Leveraging the Electronic Health Records (EHR) longitudinal data to produce actionable clinical insights has always been a critical issue for recent studies. Non-forecasted extended hospitalizations account for a disproportionate amount of resource use, the mediocre quality of inpatient care, and avoidable fatalities. The capability to predict the Length of Stay (LoS) and mortality in the early stages of the admission provides opportunities to improve care and prevent many preventable losses. Forecasting the in-hospital mortality is important in providing clinicians with enough insights to make decisions and hospitals to allocate resources, hence predicting the LoS and mortality within the first day of admission is a difficult but a paramount endeavor. The biggest challenge is that few data are available by this time, thus the prediction has to bring in the previous admissions history and free text diagnosis that are recorded immediately on admission. We propose a model that uses the multi-modal EHR structured medical codes and key demographic information to classify the LoS in 3 classes; Short Los (LoS⩽10 days), Medium LoS (10<LoS⩽30 days) and Long LoS (LoS>30 days) as well as mortality as a binary classification of a patient's death during current admission. The prediction has to use data available only within 24 h of admission. The key predictors include previous ICD9 diagnosis codes, ICD9 procedures, key demographic data, and free text diagnosis of the current admission recorded right on admission. We propose a Hierarchical Attention Network (HAN-LoS and HAN-Mor) model and train it to a dataset of over 45321 admissions recorded in the de-identified MIMIC-III dataset. For improved prediction, our attention mechanisms can focus on the most influential past admissions and most influential codes in these admissions. For fair performance evaluation, we implemented and compared the HAN model with previous approaches. With dataset balancing techniques HAN-LoS achieved an AUROC of over 0.82 and a Micro-F1 score of 0.24 and HAN-Mor achieved AUC-ROC of 0.87 hence outperforming the existing baselines that use structured medical codes as well as clinical time series for LoS and Mortality forecasting. By predicting mortality and LoS using the same model, we show that with little tuning the proposed model can be used for other clinical predictive tasks like phenotyping, decompensation,re-admission prediction, and survival analysis.

Keywords: Boosting; Class imbalance; Electronic health record; Hierarchical attention network; Length of stay.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Electronic Health Records
Hospital Mortality
Hospitalization*
Humans
International Classification of Diseases*
Length of Stay