Sub-Cluster Identification through Semi-Supervised Optimization of Rare-Cell Silhouettes (SCISSORS) in single-cell RNA-sequencing

Bioinformatics. 2023 Aug 1;39(8):btad449. doi: 10.1093/bioinformatics/btad449.

Abstract

Motivation: Single-cell RNA-sequencing (scRNA-seq) has enabled the molecular profiling of thousands to millions of cells simultaneously in biologically heterogenous samples. Currently, the common practice in scRNA-seq is to determine cell type labels through unsupervised clustering and the examination of cluster-specific genes. However, even small differences in analysis and parameter choosing can greatly alter clustering results and thus impose great influence on which cell types are identified. Existing methods largely focus on determining the optimal number of robust clusters, which can be problematic for identifying cells of extremely low abundance due to their subtle contributions toward overall patterns of gene expression.

Results: Here, we present a carefully designed framework, SCISSORS, which accurately profiles subclusters within broad cluster(s) for the identification of rare cell types in scRNA-seq data. SCISSORS employs silhouette scoring for the estimation of heterogeneity of clusters and reveals rare cells in heterogenous clusters by a multi-step semi-supervised reclustering process. Additionally, SCISSORS provides a method for the identification of marker genes of high specificity to the cell type. SCISSORS is wrapped around the popular Seurat R package and can be easily integrated into existing Seurat pipelines.

Availability and implementation: SCISSORS, including source code and vignettes, are freely available at https://github.com/jr-leary7/SCISSORS.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms*
  • Cluster Analysis
  • Gene Expression Profiling* / methods
  • RNA
  • Sequence Analysis, RNA / methods
  • Single-Cell Analysis / methods

Substances

  • RNA