Time interval uncertainty-aware and text-enhanced based disease prediction

J Biomed Inform. 2023 Mar:139:104239. doi: 10.1016/j.jbi.2022.104239. Epub 2022 Nov 7.

Abstract

Deep learning methods have achieved success in disease prediction using electronic health records (EHR) data. Most of the existing methods have some limitations. First, most of the methods adopt a homogeneous decay way to deal with the effect of time interval on patient's previous visits information. However, the effect of the time interval between patient's visits is not always negative. For example, although the time interval between visits for patients with chronic diseases is relatively long, the importance of the previous visit to the next visit is high, and we may not be able to consider the effect of the time interval as negative at this point. That is, the effect of the time interval on previous visits is exerted in a nonmonotonic manner, and it is either positive, negative, or neutral. In addition, the effect of text information on prediction results is not taken into account in most of methods. The text in EHR contains a description of the patient's past medical history and current symptoms of the disease, which is important for prediction results. In order to solve these issues, we propose a Time Interval Uncertainty-Aware and Text-Enhanced Based Disease Prediction Model, which utilizes the uncertain effects of time intervals and patient's text information for disease prediction. Firstly, we apply a cross-attention mechanism to generate a global representation of the patient using the patient's disease and text information from the EHR. Then, we use the key-query attention mechanism to obtain the two importance weights of the two visit sequences with and without time intervals, respectively. Furthermore, we achieve disease prediction by making slight adjustments to the encode part of the Transformer, a deep learning model based on a self-attention mechanism. We compare with various state-of-the-art models on two publicly available datasets, MIMIC-III and MIMIC-IV, and select the top 10 diseases with the highest frequency in the dataset as the target diseases. On the MIMIC-III dataset, our model is up to three percent higher than the optimal baseline in terms of evaluation metrics.

Keywords: Attention; Disease prediction; EHR data; Text information; Time interval.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Electronic Health Records*
  • Humans
  • Uncertainty