GapClust is a light-weight approach distinguishing rare cells from voluminous single cell expression profiles

Nat Commun. 2021 Jul 7;12(1):4197. doi: 10.1038/s41467-021-24489-8.

Abstract

Single cell RNA sequencing (scRNA-seq) is a powerful tool in detailing the cellular landscape within complex tissues. Large-scale single cell transcriptomics provide both opportunities and challenges for identifying rare cells playing crucial roles in development and disease. Here, we develop GapClust, a light-weight algorithm to detect rare cell types from ultra-large scRNA-seq datasets with state-of-the-art speed and memory efficiency. Benchmarking on diverse experimental datasets demonstrates the superior performance of GapClust compared to other recently proposed methods. When applying our algorithm to an intestine and 68 k PBMC datasets, GapClust identifies the tuft cells and a previously unrecognised subtype of monocyte, respectively.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Datasets as Topic
  • HEK293 Cells
  • Humans
  • Intestinal Mucosa / cytology
  • Jurkat Cells
  • RNA-Seq / methods*
  • Single-Cell Analysis / methods*
  • Software