Graph Representation Learning-Based Fixed-Length Clinical Feature Vector Generation from Heterogeneous Medical Records

Stud Health Technol Inform. 2024 Jan 25:310:715-719. doi: 10.3233/SHTI231058.

Abstract

Transformation of patient data extracted from a database into fixed-length numerical vectors requires expertise in topical medical knowledge as well as data manipulation-thus, manual feature design is labor-intensive. In this study, we propose a machine learning-based method to for this purpose applicable to electronic medical data recorded during hospitalization, which utilizes unsupervised feature extraction based on graph embedding. Unsupervised learning is performed on a heterogeneous graph using Graph2Vec, and the inclusion of clinically useful data in the obtained embedding representation is evaluated by predicting readmission within 30 days of discharge based on it. The embedded representations are observed to improve predictive performance significantly as the information contained in the graph increases, indicating the suitability of the proposed method for feature design corresponding to clinical information.

Keywords: Electronic health record; feature extraction; graph embedding; machine learning; unsupervised learning.

MeSH terms

  • Databases, Factual
  • Hospitalization
  • Humans
  • Knowledge
  • Medical Records*
  • Records*