Non-Negative Low-Rank Representation With Similarity Correction for Cell Type Identification in scRNA-Seq Data

IEEE/ACM Trans Comput Biol Bioinform. 2023 Nov-Dec;20(6):3737-3747. doi: 10.1109/TCBB.2023.3319375. Epub 2023 Dec 25.

Abstract

Single-cell RNA sequencing (scRNA-Seq) technology has emerged as a powerful tool to investigate cellular heterogeneity within tissues, organs, and organisms. One fundamental question pertaining to single-cell gene expression data analysis revolves around the identification of cell types, which constitutes a critical step within the data processing workflow. However, existing methods for cell type identification through learning low-dimensional latent embeddings often overlook the intercellular structural relationships. In this paper, we present a novel non-negative low-rank similarity correction model (NLRSIM) that leverages subspace clustering to preserve the global structure among cells. This model introduces a novel manifold learning process to address the issue of imbalanced neighbourhood spatial density in cells, thereby effectively preserving local geometric structures. This procedure utilizes a position-sensitive hashing algorithm to construct the graph structure of the data. The experimental results demonstrate that the NLRSIM surpasses other advanced models in terms of clustering effects and visualization experiments. The validated effectiveness of gene expression information after calibration by the NLRSIM model has been duly ascertained in the realm of relevant biological studies. The NLRSIM model offers unprecedented insights into gene expression, states, and structures at the individual cellular level, thereby contributing novel perspectives to the field.

MeSH terms

  • Algorithms
  • Cluster Analysis
  • Gene Expression Profiling / methods
  • Sequence Analysis, RNA / methods
  • Single-Cell Analysis* / methods
  • Single-Cell Gene Expression Analysis*