A practical solution to pseudoreplication bias in single-cell studies

Nat Commun. 2021 Feb 2;12(1):738. doi: 10.1038/s41467-021-21038-1.

Abstract

Cells from the same individual share common genetic and environmental backgrounds and are not statistically independent; therefore, they are subsamples or pseudoreplicates. Thus, single-cell data have a hierarchical structure that many current single-cell methods do not address, leading to biased inference, highly inflated type 1 error rates, and reduced robustness and reproducibility. This includes methods that use a batch effect correction for individual as a means of accounting for within-sample correlation. Here, we document this dependence across a range of cell types and show that pseudo-bulk aggregation methods are conservative and underpowered relative to mixed models. To compute differential expression within a specific cell type across treatment groups, we propose applying generalized linear mixed models with a random effect for individual, to properly account for both zero inflation and the correlation structure among measures from cells within an individual. Finally, we provide power estimates across a range of experimental conditions to assist researchers in designing appropriately powered studies.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Computer Simulation*
  • Quality Control
  • Sequence Analysis, RNA / methods
  • Transcriptome / genetics