Interpreting clinical latent representations using autoencoders and probabilistic models

David Chushig-Muzo; Cristina Soguero-Ruiz; Pablo de Miguel-Bohoyo; Inmaculada Mora-Jiménez

doi:10.1016/j.artmed.2021.102211

Interpreting clinical latent representations using autoencoders and probabilistic models

Artif Intell Med. 2021 Dec:122:102211. doi: 10.1016/j.artmed.2021.102211. Epub 2021 Nov 9.

Authors

David Chushig-Muzo¹, Cristina Soguero-Ruiz², Pablo de Miguel-Bohoyo³, Inmaculada Mora-Jiménez⁴

Affiliations

¹ Department of Signal Theory and Communications, Telematics and Computing Systems, Rey Juan Carlos University, Fuenlabrada 28943, Spain. Electronic address: cd.chushig@alumnos.urjc.es.
² Department of Signal Theory and Communications, Telematics and Computing Systems, Rey Juan Carlos University, Fuenlabrada 28943, Spain. Electronic address: cristina.soguero@urjc.es.
³ University Hospital of Fuenlabrada, Fuenlabrada 28943, Spain. Electronic address: pablo.miguel@salud.madrig.org.
⁴ Department of Signal Theory and Communications, Telematics and Computing Systems, Rey Juan Carlos University, Fuenlabrada 28943, Spain. Electronic address: inmaculada.mora@urjc.es.

PMID: 34823836
DOI: 10.1016/j.artmed.2021.102211

Abstract

Electronic health records (EHRs) are a valuable data source that, in conjunction with deep learning (DL) methods, have provided important outcomes in different domains, contributing to supporting decision-making. Owing to the remarkable advancements achieved by DL-based models, autoencoders (AE) are becoming extensively used in health care. Nevertheless, AE-based models are based on nonlinear transformations, resulting in black-box models leading to a lack of interpretability, which is vital in the clinical setting. To obtain insights from AE latent representations, we propose a methodology by combining probabilistic models based on Gaussian mixture models and hierarchical clustering supported by Kullback-Leibler divergence. To validate the methodology from a clinical viewpoint, we used real-world data extracted from EHRs of the University Hospital of Fuenlabrada (Spain). Records were associated with healthy and chronic hypertensive and diabetic patients. Experimental outcomes showed that our approach can find groups of patients with similar health conditions by identifying patterns associated with diagnosis and drug codes. This work opens up promising opportunities for interpreting representations obtained by the AE-based model, bringing some light to the decision-making process made by clinical experts in daily practice.

Keywords: Autoencoder; Chronic diseases; Clustering; Electronic health records; Gaussian mixture model; Learning latent representations.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Cluster Analysis
Electronic Health Records*
Humans
Models, Statistical*
Normal Distribution