An active learning approach for clustering single-cell RNA-seq data

Lab Invest. 2022 Mar;102(3):227-235. doi: 10.1038/s41374-021-00639-w. Epub 2021 Jul 9.

Abstract

Single-cell RNA sequencing (scRNA-seq) data has been widely used to profile cellular heterogeneities with a high-resolution picture. Clustering analysis is a crucial step of scRNA-seq data analysis because it provides a chance to identify and uncover undiscovered cell types. Most methods for clustering scRNA-seq data use an unsupervised learning strategy. Since the clustering step is separated from the cell annotation and labeling step, it is not uncommon for a totally exotic clustering with poor biological interpretability to be generated-a result generally undesired by biologists. To solve this problem, we proposed an active learning (AL) framework for clustering scRNA-seq data. The AL model employed a learning algorithm that can actively query biologists for labels, and this manual labeling is expected to be applied to only a subset of cells. To develop an optimal active learning approach, we explored several key parameters of the AL model in the experiments with four real scRNA-seq datasets. We demonstrate that the proposed AL model outperformed state-of-the-art unsupervised clustering methods with less than 1000 labeled cells. Therefore, we conclude that AL model is a promising tool for clustering scRNA-seq data that allows us to achieve a superior performance effectively and efficiently.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Animals
  • Cells, Cultured
  • Cluster Analysis
  • Gene Expression Profiling / methods*
  • Humans
  • Kidney / cytology
  • Kidney / metabolism
  • Leukocytes, Mononuclear / cytology
  • Leukocytes, Mononuclear / metabolism
  • Neurons / cytology
  • Neurons / metabolism
  • RNA-Seq / methods*
  • Reproducibility of Results
  • Single-Cell Analysis / methods*
  • Unsupervised Machine Learning*
  • Urinary Bladder / cytology
  • Urinary Bladder / metabolism