Valid Post-clustering Differential Analysis for Single-Cell RNA-Seq

Jesse M Zhang; Govinda M Kamath; David N Tse

doi:10.1016/j.cels.2019.07.012

Valid Post-clustering Differential Analysis for Single-Cell RNA-Seq

Cell Syst. 2019 Oct 23;9(4):383-392.e6. doi: 10.1016/j.cels.2019.07.012. Epub 2019 Sep 11.

Authors

Jesse M Zhang¹, Govinda M Kamath¹, David N Tse²

Affiliations

¹ Electrical Engineering, Stanford University, Stanford, CA 94305, USA.
² Electrical Engineering, Stanford University, Stanford, CA 94305, USA. Electronic address: dntse@stanford.edu.

Abstract

Single-cell computational pipelines involve two critical steps: organizing cells (clustering) and identifying the markers driving this organization (differential expression analysis). State-of-the-art pipelines perform differential analysis after clustering on the same dataset. We observe that because clustering "forces" separation, reusing the same dataset generates artificially low p values and hence false discoveries. We introduce a valid post-clustering differential analysis framework, which corrects for this problem. We provide software at https://github.com/jessemzhang/tn_test.

Keywords: differential expression; p value; selective inference; single-cell RNA-seq.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Cluster Analysis
Computational Biology / methods*
Datasets as Topic
Gene Expression Profiling
Humans
Selection Bias
Sequence Analysis, RNA / methods*
Single-Cell Analysis / methods*
Software

Abstract

Publication types

MeSH terms

Grants and funding