Distribution-Agnostic Deep Learning Enables Accurate Single-Cell Data Recovery and Transcriptional Regulation Interpretation

Adv Sci (Weinh). 2024 Apr;11(16):e2307280. doi: 10.1002/advs.202307280. Epub 2024 Feb 21.

Abstract

Single-cell RNA sequencing (scRNA-seq) is a robust method for studying gene expression at the single-cell level, but accurately quantifying genetic material is often hindered by limited mRNA capture, resulting in many missing expression values. Existing imputation methods rely on strict data assumptions, limiting their broader application, and lack reliable supervision, leading to biased signal recovery. To address these challenges, authors developed Bis, a distribution-agnostic deep learning model for accurately recovering missing sing-cell gene expression from multiple platforms. Bis is an optimal transport-based autoencoder model that can capture the intricate distribution of scRNA-seq data while addressing the characteristic sparsity by regularizing the cellular embedding space. Additionally, they propose a module using bulk RNA-seq data to guide reconstruction and ensure expression consistency. Experimental results show Bis outperforms other models across simulated and real datasets, showcasing superiority in various downstream analyses including batch effect removal, clustering, differential expression analysis, and trajectory inference. Moreover, Bis successfully restores gene expression levels in rare cell subsets in a tumor-matched peripheral blood dataset, revealing developmental characteristics of cytokine-induced natural killer cells within a head and neck squamous cell carcinoma microenvironment.

Keywords: imputation; optimal transport; single‐cell RNA sequencing.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, N.I.H., Extramural

MeSH terms

  • Deep Learning*
  • Gene Expression Profiling / methods
  • Humans
  • Sequence Analysis, RNA / methods
  • Single-Cell Analysis* / methods