DrivAER: Identification of driving transcriptional programs in single-cell RNA sequencing data

Gigascience. 2020 Dec 10;9(12):giaa122. doi: 10.1093/gigascience/giaa122.

Abstract

Background: Single-cell RNA sequencing (scRNA-seq) unfolds complex transcriptomic datasets into detailed cellular maps. Despite recent success, there is a pressing need for specialized methods tailored towards the functional interpretation of these cellular maps.

Findings: Here, we present DrivAER, a machine learning approach for the identification of driving transcriptional programs using autoencoder-based relevance scores. DrivAER scores annotated gene sets on the basis of their relevance to user-specified outcomes such as pseudotemporal ordering or disease status. DrivAER iteratively evaluates the information content of each gene set with respect to the outcome variable using autoencoders. We benchmark our method using extensive simulation analysis as well as comparison to existing methods for functional interpretation of scRNA-seq data. Furthermore, we demonstrate that DrivAER extracts key pathways and transcription factors that regulate complex biological processes from scRNA-seq data.

Conclusions: By quantifying the relevance of annotated gene sets with respect to specified outcome variables, DrivAER greatly enhances our ability to understand the underlying molecular mechanisms.

Keywords: Autoencoder; machine learning; manifold interpretation; single-cell RNA sequencing; transcription factor.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Gene Expression Profiling
  • Machine Learning
  • RNA*
  • Sequence Analysis, RNA
  • Single-Cell Analysis*
  • Transcriptome

Substances

  • RNA