Learning a Latent Space of Highly Multidimensional Cancer Data

Pac Symp Biocomput. 2020:25:379-390.

Abstract

We introduce a Unified Disentanglement Network (UFDN) trained on The Cancer Genome Atlas (TCGA), which we refer to as UFDN-TCGA. We demonstrate that UFDN-TCGA learns a biologically relevant, low-dimensional latent space of high-dimensional gene expression data by applying our network to two classification tasks of cancer status and cancer type. UFDN-TCGA performs comparably to random forest methods. The UFDN allows for continuous, partial interpolation between distinct cancer types. Furthermore, we perform an analysis of differentially expressed genes between skin cutaneous melanoma (SKCM) samples and the same samples interpolated into glioblastoma (GBM). We demonstrate that our interpolations consist of relevant metagenes that recapitulate known glioblastoma mechanisms.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Brain Neoplasms*
  • Computational Biology
  • Glioblastoma* / genetics
  • Humans
  • Melanoma* / genetics
  • Skin Neoplasms* / genetics