Clustering single-cell multimodal omics data with jrSiCKLSNMF

Front Genet. 2023 Jun 9:14:1179439. doi: 10.3389/fgene.2023.1179439. eCollection 2023.

Abstract

Introduction: The development of multimodal single-cell omics methods has enabled the collection of data across different omics modalities from the same set of single cells. Each omics modality provides unique information about cell type and function, so the ability to integrate data from different modalities can provide deeper insights into cellular functions. Often, single-cell omics data can prove challenging to model because of high dimensionality, sparsity, and technical noise. Methods: We propose a novel multimodal data analysis method called joint graph-regularized Single-Cell Kullback-Leibler Sparse Non-negative Matrix Factorization (jrSiCKLSNMF, pronounced "junior sickles NMF") that extracts latent factors shared across omics modalities within the same set of single cells. Results: We compare our clustering algorithm to several existing methods on four sets of data simulated from third party software. We also apply our algorithm to a real set of cell line data. Discussion: We show overwhelmingly better clustering performance than several existing methods on the simulated data. On a real multimodal omics dataset, we also find our method to produce scientifically accurate clustering results.

Keywords: KL divergence; graph regularization; multimodal omics; multiplicative updates; scATAC-seq; scRNA-seq; sparsity.

Grants and funding

This work was partially supported by NIH grant 1UL1TR000064 from the Center for Scientific Review.