Testing for a difference in means of a single feature after clustering

Yiqun T Chen; Lucy L Gao

Testing for a difference in means of a single feature after clustering

ArXiv [Preprint]. 2023 Nov 27:arXiv:2311.16375v1.

Authors

Yiqun T Chen¹, Lucy L Gao²

Affiliations

¹ Department of Biomedical Data Science, Stanford University.
² Department of Statistics, University of British Columbia, November 29, 2023.

PMID: 38076519
PMCID: PMC10705581

Abstract

For many applications, it is critical to interpret and validate groups of observations obtained via clustering. A common validation approach involves testing differences in feature means between observations in two estimated clusters. In this setting, classical hypothesis tests lead to an inflated Type I error rate. To overcome this problem, we propose a new test for the difference in means in a single feature between a pair of clusters obtained using hierarchical or k-means clustering. The test based on the proposed p-value controls the selective Type I error rate in finite samples and can be efficiently computed. We further illustrate the validity and power of our proposal in simulation and demonstrate its use on single-cell RNA-sequencing data.

Keywords: Hypothesis testing; Post-selection inference; Type I Error; Unsupervised learning.

Publication types

Preprint

Abstract

Publication types

Grants and funding