Network embedding-based representation learning for single cell RNA-seq data

Nucleic Acids Res. 2017 Nov 2;45(19):e166. doi: 10.1093/nar/gkx750.

Abstract

Single cell RNA-seq (scRNA-seq) techniques can reveal valuable insights of cell-to-cell heterogeneities. Projection of high-dimensional data into a low-dimensional subspace is a powerful strategy in general for mining such big data. However, scRNA-seq suffers from higher noise and lower coverage than traditional bulk RNA-seq, hence bringing in new computational difficulties. One major challenge is how to deal with the frequent drop-out events. The events, usually caused by the stochastic burst effect in gene transcription and the technical failure of RNA transcript capture, often render traditional dimension reduction methods work inefficiently. To overcome this problem, we have developed a novel Single Cell Representation Learning (SCRL) method based on network embedding. This method can efficiently implement data-driven non-linear projection and incorporate prior biological knowledge (such as pathway information) to learn more meaningful low-dimensional representations for both cells and genes. Benchmark results show that SCRL outperforms other dimensional reduction methods on several recent scRNA-seq datasets.

MeSH terms

  • Algorithms*
  • Computational Biology / methods*
  • Female
  • Gene Expression Profiling / methods
  • Gene Regulatory Networks / genetics*
  • Germ Cells / metabolism
  • Humans
  • Male
  • Reproducibility of Results
  • Sequence Analysis, RNA / methods*
  • Single-Cell Analysis / methods*