scGEM: Unveiling the Nested Tree-Structured Gene Co-Expressing Modules in Single Cell Transcriptome Data

Cancers (Basel). 2023 Aug 26;15(17):4277. doi: 10.3390/cancers15174277.

Abstract

Background: Single-cell transcriptome analysis has fundamentally changed biological research by allowing higher-resolution computational analysis of individual cells and subsets of cell types. However, few methods have met the need to recognize and quantify the underlying cellular programs that determine the specialization and differentiation of the cell types.

Methods: In this study, we present scGEM, a nested tree-structured nonparametric Bayesian model, to reveal the gene co-expression modules (GEMs) reflecting transcriptome processes in single cells.

Results: We show that scGEM can discover shared and specialized transcriptome signals across different cell types using peripheral blood mononuclear single cells and early brain development single cells. scGEM outperformed other methods in perplexity and topic coherence (p < 0.001) on our simulation data. Larger datasets, deeper trees and pre-trained models are shown to be positively associated with better scGEM performance. The GEMs obtained from triple-negative breast cancer single cells exhibited better correlations with lymphocyte infiltration (p = 0.009) and the cell cycle (p < 0.001) than other methods in additional validation on the bulk RNAseq dataset.

Conclusions: Altogether, we demonstrate that scGEM can be used to model the hidden cellular functions of single cells, thereby unveiling the specialization and generalization of transcriptomic programs across different types of cells.

Keywords: cellular program; gene co-expressing module; nested tree structure; single cell transcriptome; topic model.