FRMC: a fast and robust method for the imputation of scRNA-seq data

RNA Biol. 2021 Oct 15;18(sup1):172-181. doi: 10.1080/15476286.2021.1960688. Epub 2021 Aug 30.

Abstract

The high-resolution feature of single-cell transcriptome sequencing technology allows researchers to observe cellular gene expression profiles at the single-cell level, offering numerous possibilities for subsequent biomedical investigation. However, the unavoidable technical impact of high missing values in the gene-cell expression matrices generated by insufficient RNA input severely hampers the accuracy of downstream analysis. To address this problem, it is essential to develop a more rapid and stable imputation method with greater accuracy, which should not only be able to recover the missing data, but also effectively facilitate the following biological mechanism analysis. The existing imputation methods all have their drawbacks and limitations, some require pre-assumed data distribution, some cannot distinguish between technical and biological zeros, and some have poor computational performance. In this paper, we presented a novel imputation software FRMC for single-cell RNA-Seq data, which innovates a fast and accurate singular value thresholding approximation method. The experiments demonstrated that FRMC can not only precisely distinguish 'true zeros' from dropout events and correctly impute missing values attributed to technical noises, but also effectively enhance intracellular and intergenic connections and achieve accurate clustering of cells in biological applications. In summary, FRMC can be a powerful tool for analysing single-cell data because it ensures biological significance, accuracy, and rapidity simultaneously. FRMC is implemented in Python and is freely accessible to non-commercial users on GitHub: https://github.com/HUST-DataMan/FRMC.

Keywords: Imputation1; dropout event3; low-rank matrix optimization4; scRNA-seq2; singular value thresholding iteration5; sparsity6.

MeSH terms

  • Exome Sequencing / methods*
  • Gene Expression Profiling*
  • Humans
  • RNA-Seq / methods*
  • Sequence Analysis, RNA / methods*
  • Single-Cell Analysis / methods*
  • Software*