Dual-GCN-based deep clustering with triplet contrast for ScRNA-seq data analysis

Comput Biol Chem. 2023 Oct:106:107924. doi: 10.1016/j.compbiolchem.2023.107924. Epub 2023 Jul 17.

Abstract

Single-cell RNA sequencing (ScRNA-seq) technology reveals gene expression information at the cellular level. The critical tasks in ScRNA-seq data analysis are clustering and dimensionality reduction. Recent deep clustering algorithms are used to optimize the two tasks jointly, and their variations, graph-based deep clustering algorithms, are used to capture and preserve topological information in the process. However, the existing graph-based deep clustering algorithms ignore the distribution information of nodes when constructing cell graphs which leads to incomplete information in the embedding representation; and graph convolutional networks (GCN), which are most commonly used, often suffer from over-smoothing that leads to high sample similarity in the embedding representation and then poor clustering performance. Here, the dual-GCN-based deep clustering with Triplet contrast (scDGDC) is proposed for dimensionality reduction and clustering of scRNA-seq data. Two critical components are dual-GCN-based encoder for capturing more comprehensive topological information and triplet contrast for reducing GCN over-smoothing. The two components improve the dimensionality reduction and clustering performance of scDGDC in terms of information acquisition and model optimization, respectively. The experiments on eight real ScRNA-seq datasets showed that scDGDC achieves excellent performance for both clustering and dimensionality reduction tasks and is high robustness to parameters.

Keywords: Deep clustering; Diffusion maps; Dimensionality reduction; GCN; ScRNA-seq data; Triplet contrast.

MeSH terms

  • Algorithms*
  • Cluster Analysis
  • Data Analysis
  • Single-Cell Gene Expression Analysis*