Accurate integration of multiple heterogeneous single-cell RNA-seq data sets by learning contrastive biological variation

Genome Res. 2023 May;33(5):750-762. doi: 10.1101/gr.277522.122. Epub 2023 Jun 12.

Abstract

For most biological and medical applications of single-cell transcriptomics, an integrative study of multiple heterogeneous single-cell RNA sequencing (scRNA-seq) data sets is crucial. However, present approaches are unable to integrate diverse data sets from various biological conditions effectively because of the confounding effects of biological and technical differences. We introduce single-cell integration (scInt), an integration method based on accurate, robust cell-cell similarity construction and unified contrastive biological variation learning from multiple scRNA-seq data sets. scInt provides a flexible and effective approach to transfer knowledge from the already integrated reference to the query. We show that scInt outperforms 10 other cutting-edge approaches using both simulated and real data sets, particularly in the case of complex experimental designs. Application of scInt to mouse developing tracheal epithelial data shows its ability to integrate development trajectories from different developmental stages. Furthermore, scInt successfully identifies functionally distinct condition-specific cell subpopulations in single-cell heterogeneous samples from a variety of biological conditions.

MeSH terms

  • Animals
  • Exome Sequencing
  • Gene Expression Profiling / methods
  • Mice
  • Sequence Analysis, RNA / methods
  • Single-Cell Analysis* / methods
  • Single-Cell Gene Expression Analysis*