A robust nonlinear low-dimensional manifold for single cell RNA-seq data

BMC Bioinformatics. 2020 Jul 21;21(1):324. doi: 10.1186/s12859-020-03625-z.

Abstract

Background: Modern developments in single-cell sequencing technologies enable broad insights into cellular state. Single-cell RNA sequencing (scRNA-seq) can be used to explore cell types, states, and developmental trajectories to broaden our understanding of cellular heterogeneity in tissues and organs. Analysis of these sparse, high-dimensional experimental results requires dimension reduction. Several methods have been developed to estimate low-dimensional embeddings for filtered and normalized single-cell data. However, methods have yet to be developed for unfiltered and unnormalized count data that estimate uncertainty in the low-dimensional space. We present a nonlinear latent variable model with robust, heavy-tailed error and adaptive kernel learning to estimate low-dimensional nonlinear structure in scRNA-seq data.

Results: Gene expression in a single cell is modeled as a noisy draw from a Gaussian process in high dimensions from low-dimensional latent positions. This model is called the Gaussian process latent variable model (GPLVM). We model residual errors with a heavy-tailed Student's t-distribution to estimate a manifold that is robust to technical and biological noise found in normalized scRNA-seq data. We compare our approach to common dimension reduction tools across a diverse set of scRNA-seq data sets to highlight our model's ability to enable important downstream tasks such as clustering, inferring cell developmental trajectories, and visualizing high throughput experiments on available experimental data.

Conclusion: We show that our adaptive robust statistical approach to estimate a nonlinear manifold is well suited for raw, unfiltered gene counts from high-throughput sequencing technologies for visualization, exploration, and uncertainty estimation of cell states.

Keywords: Dimension reduction; Gaussian process latent variable model; Manifold learning; Nonlinear maps; Robust model; Single cell RNA sequencing.

MeSH terms

  • Blood Cells / metabolism
  • Gene Expression Regulation
  • Humans
  • Models, Genetic
  • Neurons / metabolism
  • Nonlinear Dynamics*
  • Normal Distribution
  • Principal Component Analysis
  • RNA-Seq*
  • Single-Cell Analysis / methods*
  • Time Factors