Single-cell computational pipelines involve two critical steps: organizing cells (clustering) and identifying the markers driving this organization (differential expression analysis). State-of-the-art pipelines perform differential analysis after clustering on the same dataset. We observe that because clustering "forces" separation, reusing the same dataset generates artificially low p values and hence false discoveries. We introduce a valid post-clustering differential analysis framework, which corrects for this problem. We provide software at https://github.com/jessemzhang/tn_test.
Keywords: differential expression; p value; selective inference; single-cell RNA-seq.
Copyright © 2019. Published by Elsevier Inc.