scInterpreter: a knowledge-regularized generative model for interpretably integrating scRNA-seq data

BMC Bioinformatics. 2023 Dec 16;24(1):481. doi: 10.1186/s12859-023-05579-4.

Abstract

Background: The rapid emergence of single-cell RNA-seq (scRNA-seq) data presents remarkable opportunities for broad investigations through integration analyses. However, most integration models are black boxes that lack interpretability or are hard to train.

Results: To address the above issues, we propose scInterpreter, a deep learning-based interpretable model. scInterpreter substantially outperforms other state-of-the-art (SOTA) models in multiple benchmark datasets. In addition, scInterpreter is extensible and can integrate and annotate atlas scRNA-seq data. We evaluated the robustness of scInterpreter in a variety of situations. Through comparison experiments, we found that with a knowledge prior, the training process can be significantly accelerated. Finally, we conducted interpretability analysis for each dimension (pathway) of cell representation in the embedding space.

Conclusions: The results showed that the cell representations obtained by scInterpreter are full of biological significance. Through weight sorting, we found several new genes related to pathways in PBMC dataset. In general, scInterpreter is an effective and interpretable integration tool. It is expected that scInterpreter will bring great convenience to the study of single-cell transcriptomics.

Keywords: Batch correction; Deep learning; Integration; Knowledge-regularized; Single-cell RNA-seq.

MeSH terms

  • Cluster Analysis
  • Gene Expression Profiling / methods
  • Leukocytes, Mononuclear* / metabolism
  • Sequence Analysis, RNA / methods
  • Single-Cell Analysis / methods
  • Single-Cell Gene Expression Analysis*