A Fusion Learning Model Based on Deep Learning for Single-Cell RNA Sequencing Data Clustering

J Comput Biol. 2024 May 20. doi: 10.1089/cmb.2024.0512. Online ahead of print.

Abstract

Single-cell RNA sequencing (scRNA-seq) technology provides a means for studying biology from a cellular perspective. The fundamental goal of scRNA-seq data analysis is to discriminate single-cell types using unsupervised clustering. Few single-cell clustering algorithms have taken into account both deep and surface information, despite the recent slew of suggestions. Consequently, this article constructs a fusion learning framework based on deep learning, namely scGASI. For learning a clustering similarity matrix, scGASI integrates data affinity recovery and deep feature embedding in a unified scheme based on various top feature sets. Next, scGASI learns the low-dimensional latent representation underlying the data using a graph autoencoder to mine the hidden information residing in the data. To efficiently merge the surface information from raw area and the deeper potential information from underlying area, we then construct a fusion learning model based on self-expression. scGASI uses this fusion learning model to learn the similarity matrix of an individual feature set as well as the clustering similarity matrix of all feature sets. Lastly, gene marker identification, visualization, and clustering are accomplished using the clustering similarity matrix. Extensive verification on actual data sets demonstrates that scGASI outperforms many widely used clustering techniques in terms of clustering accuracy.

Keywords: clustering; deep learning; fusion learning; scRNA-seq; self-expression.