SSNMDI: a novel joint learning model of semi-supervised non-negative matrix factorization and data imputation for clustering of single-cell RNA-seq data

Brief Bioinform. 2023 May 19;24(3):bbad149. doi: 10.1093/bib/bbad149.

Abstract

Motivation: Single-cell RNA sequencing (scRNA-seq) technology attracts extensive attention in the biomedical field. It can be used to measure gene expression and analyze the transcriptome at the single-cell level, enabling the identification of cell types based on unsupervised clustering. Data imputation and dimension reduction are conducted before clustering because scRNA-seq has a high 'dropout' rate, noise and linear inseparability. However, independence of dimension reduction, imputation and clustering cannot fully characterize the pattern of the scRNA-seq data, resulting in poor clustering performance. Herein, we propose a novel and accurate algorithm, SSNMDI, that utilizes a joint learning approach to simultaneously perform imputation, dimensionality reduction and cell clustering in a non-negative matrix factorization (NMF) framework. In addition, we integrate the cell annotation as prior information, then transform the joint learning into a semi-supervised NMF model. Through experiments on 14 datasets, we demonstrate that SSNMDI has a faster convergence speed, better dimensionality reduction performance and a more accurate cell clustering performance than previous methods, providing an accurate and robust strategy for analyzing scRNA-seq data. Biological analysis are also conducted to validate the biological significance of our method, including pseudotime analysis, gene ontology and survival analysis. We believe that we are among the first to introduce imputation, partial label information, dimension reduction and clustering to the single-cell field.

Availability and implementation: The source code for SSNMDI is available at https://github.com/yushanqiu/SSNMDI.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Cluster Analysis
  • Gene Expression Profiling*
  • Sequence Analysis, RNA / methods
  • Single-Cell Analysis / methods
  • Single-Cell Gene Expression Analysis*