ScCCL: Single-Cell Data Clustering Based on Self-Supervised Contrastive Learning

IEEE/ACM Trans Comput Biol Bioinform. 2023 May-Jun;20(3):2233-2241. doi: 10.1109/TCBB.2023.3241129. Epub 2023 Jun 5.

Abstract

The growing maturity of single-cell RNA-sequencing (scRNA-seq) technology allows us to explore the heterogeneity of tissues, organisms, and complex diseases at cellular level. In single-cell data analysis, clustering calculation is very important. However, the high dimensionality of scRNA-seq data, the ever-increasing number of cells, and the unavoidable technical noise bring great challenges to clustering calculations. Motivated by the good performance of contrastive learning in multiple domains, we propose ScCCL, a novel self-supervised contrastive learning method for clustering of scRNA-seq data. ScCCL first randomly masks the gene expression of each cell twice and adds a small amount of Gaussian noise, and then uses the momentum encoder structure to extract features from the enhanced data. Contrastive learning is then applied in the instance-level contrastive learning module and the cluster-level contrastive learning module, respectively. After training, a representation model that can efficiently extract high-order embeddings of single cells is obtained. We selected two evaluation metrics, ARI and NMI, to conduct experiments on multiple public datasets. The results show that ScCCL improves the clustering effect compared with the benchmark algorithms. Notably, since ScCCL does not depend on a specific type of data, it can also be helpful in clustering analysis of single-cell multi-omics data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Cluster Analysis
  • Gene Expression Profiling* / methods
  • Sequence Analysis, RNA / methods
  • Single-Cell Analysis / methods