CL-Impute: A contrastive learning-based imputation for dropout single-cell RNA-seq data

Comput Biol Med. 2023 Sep:164:107263. doi: 10.1016/j.compbiomed.2023.107263. Epub 2023 Jul 23.

Abstract

Background: Single-cell RNA-sequencing (scRNA-seq) technology has revolutionized the study of cell heterogeneity and biological interpretation at the single-cell level. However, the dropout events commonly present in scRNA-seq data can markedly reduce the reliability of downstream analysis. Existing imputation methods often overlook the discrepancy between the established cell relationship from dropout noisy data and reality, which limits their performances due to the learned untrustworthy cell representations.

Method: Here, we propose a novel approach called the CL-Impute (Contrastive Learning-based Impute) model for estimating missing genes without relying on preconstructed cell relationships. CL-Impute utilizes contrastive learning and a self-attention network to address this challenge. Specifically, the proposed CL-Impute model leverages contrastive learning to learn cell representations from the self-perspective of dropout events, whereas the self-attention network captures cell relationships from the global-perspective.

Results: Experimental results on four benchmark datasets, including quantitative assessment, cell clustering, gene identification, and trajectory inference, demonstrate the superior performance of CL-Impute compared with that of existing state-of-the-art imputation methods. Furthermore, our experiment reveals that combining contrastive learning and masking cell augmentation enables the model to learn actual latent features from noisy data with a high rate of dropout events, enhancing the reliability of imputed values.

Conclusions: CL-Impute is a novel contrastive learning-based method to impute scRNA-seq data in the context of high dropout rate. The source code of CL-Impute is available at https://github.com/yuchen21-web/Imputation-for-scRNA-seq.

Keywords: Contrastive learning; Downstream analysis; Dropout events; Imputation; scRNA-seq.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cluster Analysis
  • Gene Expression Profiling
  • Reproducibility of Results
  • Sequence Analysis, RNA / methods
  • Single-Cell Analysis* / methods
  • Single-Cell Gene Expression Analysis*
  • Software