TsImpute: an accurate two-step imputation method for single-cell RNA-seq data

Bioinformatics. 2023 Dec 1;39(12):btad731. doi: 10.1093/bioinformatics/btad731.

Abstract

Motivation: Single-cell RNA sequencing (scRNA-seq) technology has enabled discovering gene expression patterns at single cell resolution. However, due to technical limitations, there are usually excessive zeros, called "dropouts," in scRNA-seq data, which may mislead the downstream analysis. Therefore, it is crucial to impute these dropouts to recover the biological information.

Results: We propose a two-step imputation method called tsImpute to impute scRNA-seq data. At the first step, tsImpute adopts zero-inflated negative binomial distribution to discriminate dropouts from true zeros and performs initial imputation by calculating the expected expression level. At the second step, it conducts clustering with this modified expression matrix, based on which the final distance weighted imputation is performed. Numerical results based on both simulated and real data show that tsImpute achieves favorable performance in terms of gene expression recovery, cell clustering, and differential expression analysis.

Availability and implementation: The R package of tsImpute is available at https://github.com/ZhengWeihuaYNU/tsImpute.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cluster Analysis
  • Exome Sequencing
  • Gene Expression Profiling
  • Sequence Analysis, RNA / methods
  • Single-Cell Analysis
  • Single-Cell Gene Expression Analysis*
  • Software*