A posterior probability based Bayesian method for single-cell RNA-seq data imputation

Methods. 2023 Aug:216:21-38. doi: 10.1016/j.ymeth.2023.06.004. Epub 2023 Jun 12.

Abstract

Single-cell RNA-sequencing (scRNA-seq) data suffer from a lot of zeros. Such dropout events impede the downstream data analyses. We propose BayesImpute to infer and impute dropouts from the scRNA-seq data. Using the expression rate and coefficient of variation of the genes within the cell subpopulation, BayesImpute first determines likely dropouts, and then constructs the posterior distribution for each gene and uses the posterior mean to impute dropout values. Some simulated and real experiments show that BayesImpute can effectively identify dropout events and reduce the introduction of false positive signals. Additionally, BayesImpute successfully recovers the true expression levels of missing values, restores the gene-to-gene and cell-to-cell correlation coefficient, and maintains the biological information in bulk RNA-seq data. Furthermore, BayesImpute boosts the clustering and visualization of cell subpopulations and improves the identification of differentially expressed genes. We further demonstrate that, in comparison to other statistical-based imputation methods, BayesImpute is scalable and fast with minimal memory usage.

Keywords: Bayesian model; Dropouts; Imputation; Single-cell RNA-seq.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bayes Theorem
  • Gene Expression Profiling
  • Probability
  • Sequence Analysis, RNA / methods
  • Single-Cell Analysis / methods
  • Single-Cell Gene Expression Analysis*
  • Software*