Interpreting clinical latent representations using autoencoders and probabilistic models

Artif Intell Med. 2021 Dec:122:102211. doi: 10.1016/j.artmed.2021.102211. Epub 2021 Nov 9.

Abstract

Electronic health records (EHRs) are a valuable data source that, in conjunction with deep learning (DL) methods, have provided important outcomes in different domains, contributing to supporting decision-making. Owing to the remarkable advancements achieved by DL-based models, autoencoders (AE) are becoming extensively used in health care. Nevertheless, AE-based models are based on nonlinear transformations, resulting in black-box models leading to a lack of interpretability, which is vital in the clinical setting. To obtain insights from AE latent representations, we propose a methodology by combining probabilistic models based on Gaussian mixture models and hierarchical clustering supported by Kullback-Leibler divergence. To validate the methodology from a clinical viewpoint, we used real-world data extracted from EHRs of the University Hospital of Fuenlabrada (Spain). Records were associated with healthy and chronic hypertensive and diabetic patients. Experimental outcomes showed that our approach can find groups of patients with similar health conditions by identifying patterns associated with diagnosis and drug codes. This work opens up promising opportunities for interpreting representations obtained by the AE-based model, bringing some light to the decision-making process made by clinical experts in daily practice.

Keywords: Autoencoder; Chronic diseases; Clustering; Electronic health records; Gaussian mixture model; Learning latent representations.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cluster Analysis
  • Electronic Health Records*
  • Humans
  • Models, Statistical*
  • Normal Distribution