Motivation: A large number of studies have shown that clustering is a crucial step in scRNA-seq analysis. Most existing methods are based on unsupervised learning without the prior exploitation of any domain knowledge, which does not utilize available gold-standard labels. When confronted by the high dimensionality and general dropout events of scRNA-seq data, purely unsupervised clustering methods may not produce biologically interpretable clusters, which complicate cell type assignment.
Results: In this article, we propose a semi-supervised clustering method based on a capsule network named scCNC that integrates domain knowledge into the clustering step. Significantly, we also propose a Semi-supervised Greedy Iterative Training method used to train the whole network. Experiments on some real scRNA-seq datasets show that scCNC can significantly improve clustering performance and facilitate downstream analyses.
Availability and implementation: The source code of scCNC is freely available at https://github.com/WHY-17/scCNC.
Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author(s) 2022. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.