The need to reassess single-cell RNA sequencing datasets: the importance of biological sample processing

Alex M Ascensión; Marcos J Araúzo-Bravo; Ander Izeta

doi:10.12688/f1000research.54864.2

The need to reassess single-cell RNA sequencing datasets: the importance of biological sample processing

F1000Res. 2021 Aug 6:10:767. doi: 10.12688/f1000research.54864.2. eCollection 2021.

Authors

Alex M Ascensión^{1

2}, Marcos J Araúzo-Bravo^{1

3

4

5

6}, Ander Izeta^{2

7}

Affiliations

¹ Computational Biology and Systems Biomedicine Group, Biodonostia Health Research Institute, San Sebastian, Gipuzkoa, 20014, Spain.
² Tissue Engineering Group, Biodonostia Health Research Institute, San Sebastian, Gipuzkoa, 20014, Spain.
³ Computational Biomedicine Data Analysis Platform, Biodonostia Health Research Institute, San Sebastian, Gipuzkoa, 20014, Spain.
⁴ IKERBASQUE, Basque Foundation for Science, Bilbao, Spain.
⁵ CIBER of Frailty and Healthy Aging (CIBERfes), Madrid, Spain.
⁶ Computational Biology and Bioinformatics Group, Max Planck Institute for Molecular Biomedicine, Münster, Germany.
⁷ Department of Biomedical Engineering and Science, Tecnun-University of Navarra, School of Engineering, San Sebastian, Gipuzkoa, 20009, Spain.

Abstract

Background: The advent of single-cell RNA sequencing (scRNAseq) and additional single-cell omics technologies have provided scientists with unprecedented tools to explore biology at cellular resolution. However, reaching an appropriate number of good quality reads per cell and reasonable numbers of cells within each of the populations of interest are key to infer relevant conclusions about the underlying biology of the dataset. For these reasons, scRNAseq studies are constantly increasing the number of cells analysed and the granularity of the resultant transcriptomics analyses. Methods: We aimed to identify previously described fibroblast subpopulations in healthy adult human skin by using the largest dataset published to date (528,253 sequenced cells) and an unsupervised population-matching algorithm. Results: Our reanalysis of this landmark resource demonstrates that a substantial proportion of cell transcriptomic signatures may be biased by cellular stress and response to hypoxic conditions. Conclusions: We postulate that careful design of experimental conditions is needed to avoid long processing times of biological samples. Additionally, computation of large datasets might undermine the extent of the analysis, possibly due to long processing times.

Keywords: Python; computational analysis; fibroblasts; reproducibility; single-cell RNA-seq; skin.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Gene Expression Profiling*
Humans
Sequence Analysis, RNA
Single-Cell Analysis*
Specimen Handling
Transcriptome