scSampler: fast diversity-preserving subsampling of large-scale single-cell transcriptomic data

Bioinformatics. 2022 May 26;38(11):3126-3127. doi: 10.1093/bioinformatics/btac271.

Abstract

Summary: The number of cells measured in single-cell transcriptomic data has grown fast in recent years. For such large-scale data, subsampling is a powerful and often necessary tool for exploratory data analysis. However, the easiest random subsampling is not ideal from the perspective of preserving rare cell types. Therefore, diversity-preserving subsampling is required for fast exploration of cell types in a large-scale dataset. Here, we propose scSampler, an algorithm for fast diversity-preserving subsampling of single-cell transcriptomic data.

Availability and implementation: scSampler is implemented in Python and is published under the MIT source license. It can be installed by "pip install scsampler" and used with the Scanpy pipline. The code is available on GitHub: https://github.com/SONGDONGYUAN1994/scsampler. An R interface is available at: https://github.com/SONGDONGYUAN1994/rscsampler.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Data Analysis
  • Software*
  • Transcriptome*