Variational autoencoders learn transferrable representations of metabolomics data

Daniel P Gomari; Annalise Schweickart; Leandro Cerchietti; Elisabeth Paietta; Hugo Fernandez; Hassen Al-Amin; Karsten Suhre; Jan Krumsiek

doi:10.1038/s42003-022-03579-3

Variational autoencoders learn transferrable representations of metabolomics data

Commun Biol. 2022 Jun 30;5(1):645. doi: 10.1038/s42003-022-03579-3.

Authors

Daniel P Gomari^#^{1

2

3}, Annalise Schweickart^#⁴, Leandro Cerchietti⁵, Elisabeth Paietta⁶, Hugo Fernandez⁷, Hassen Al-Amin⁸, Karsten Suhre⁹, Jan Krumsiek¹⁰

Affiliations

¹ Institute of Computational Biology, Helmholtz Center Munich-German Research Center for Environmental Health, 85764, Neuherberg, Germany.
² Technical University of Munich-School of Life Sciences, 85354, Freising, Germany.
³ Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA.
⁴ Department of Physiology and Biophysics, Weill Cornell Medicine, Institute for Computational Biomedicine, Englander Institute for Precision Medicine, New York, NY, 10021, USA.
⁵ Department of Medicine, Hematology and Oncology Division, Weill Cornell Medicine, New York, 10065, NY, USA.
⁶ Albert Einstein College of Medicine-Montefiore Medical Center, Bronx, NY, USA.
⁷ Moffitt Malignant Hematology & Cellular Therapy at Memorial Healthcare System, Pembroke Pines, FL, USA.
⁸ Department of Psychiatry, Weill Cornell Medicine-Qatar, Education City, P.O. Box 24144, Doha, Qatar.
⁹ Department of Physiology and Biophysics, Weill Cornell Medical College-Qatar Education City, Doha, Qatar.
¹⁰ Department of Physiology and Biophysics, Weill Cornell Medicine, Institute for Computational Biomedicine, Englander Institute for Precision Medicine, New York, NY, 10021, USA. jak2043@med.cornell.edu.

^# Contributed equally.

Abstract

Dimensionality reduction approaches are commonly used for the deconvolution of high-dimensional metabolomics datasets into underlying core metabolic processes. However, current state-of-the-art methods are widely incapable of detecting nonlinearities in metabolomics data. Variational Autoencoders (VAEs) are a deep learning method designed to learn nonlinear latent representations which generalize to unseen data. Here, we trained a VAE on a large-scale metabolomics population cohort of human blood samples consisting of over 4500 individuals. We analyzed the pathway composition of the latent space using a global feature importance score, which demonstrated that latent dimensions represent distinct cellular processes. To demonstrate model generalizability, we generated latent representations of unseen metabolomics datasets on type 2 diabetes, acute myeloid leukemia, and schizophrenia and found significant correlations with clinical patient groups. Notably, the VAE representations showed stronger effects than latent dimensions derived by linear and non-linear principal component analysis. Taken together, we demonstrate that the VAE is a powerful method that learns biologically meaningful, nonlinear, and transferrable latent representations of metabolomics data.

Variational autoencoders learn transferrable representations of metabolomics data

Authors

Affiliations

Abstract

Publication types

MeSH terms

Grants and funding