Comparison of high-throughput single-cell RNA sequencing data processing pipelines

Brief Bioinform. 2021 May 20;22(3):bbaa116. doi: 10.1093/bib/bbaa116.

Abstract

With the development of single-cell RNA sequencing (scRNA-seq) technology, it has become possible to perform large-scale transcript profiling for tens of thousands of cells in a single experiment. Many analysis pipelines have been developed for data generated from different high-throughput scRNA-seq platforms, bringing a new challenge to users to choose a proper workflow that is efficient, robust and reliable for a specific sequencing platform. Moreover, as the amount of public scRNA-seq data has increased rapidly, integrated analysis of scRNA-seq data from different sources has become increasingly popular. However, it remains unclear whether such integrated analysis would be biassed if the data were processed by different upstream pipelines. In this study, we encapsulated seven existing high-throughput scRNA-seq data processing pipelines with Nextflow, a general integrative workflow management framework, and evaluated their performance in terms of running time, computational resource consumption and data analysis consistency using eight public datasets generated from five different high-throughput scRNA-seq platforms. Our work provides a useful guideline for the selection of scRNA-seq data processing pipelines based on their performance on different real datasets. In addition, these guidelines can serve as a performance evaluation framework for future developments in high-throughput scRNA-seq data processing.

Keywords: data processing; performance comparison; pipeline; scRNA-seq.

Publication types

  • Comparative Study

MeSH terms

  • 3T3 Cells
  • Algorithms*
  • Animals
  • Databases, Nucleic Acid*
  • HEK293 Cells
  • Humans
  • Mice
  • RNA* / biosynthesis
  • RNA* / genetics
  • RNA-Seq*
  • Single-Cell Analysis*

Substances

  • RNA