Dimensionality Reduction of Single-Cell RNA Sequencing Data by Combining Entropy and Denoising AutoEncoder

J Comput Biol. 2022 Oct;29(10):1074-1084. doi: 10.1089/cmb.2022.0118. Epub 2022 Jul 14.

Abstract

ABSTRACT Single-cell RNA sequencing (scRNA-seq) can present cellular heterogeneity at higher resolution when measuring the gene expression in an individual cell. However, there are still some computational problems in scRNA-seq data, including high dimensionality, high sparseness, and high noise. To solve them, dimensionality reduction is essential as it reduces dimensions and also removes most of the zeros and noise. Therefore, we propose a hybrid dimensionality reduction algorithm for scRNA-seq data by integrating binning-based entropy and a denoising autoencoder, named ScEDA. In ScEDA, a novel binning-based entropy estimation method is performed to select efficient genes, while removing noise. For each gene, binning-based entropy is designed to describe the differences in its expression across all cells, that is, the distribution of expression of each gene in all cells. Genes are regarded as inefficient and removed when they achieve low binning-based entropy. Moreover, by combining Kullback-Leibler (KL) divergence with the autoencoder, the objective function is reconstructed to maximize the similarity in distribution between input data and reconstructed data. Furthermore, by adding Poisson-distributed noise to the original input data, the denoising autoencoder is used to improve robustness. Compared with three other clustering methods, ScEDA provides superior average performance on 16 real scRNA-seq datasets, with obvious enhancement in large-scale datasets.

Keywords: binning-based entropy; clustering; denoising AutoEncoder; dimensionality reduction; single-cell RNA-seq data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Cluster Analysis
  • Entropy
  • Gene Expression Profiling / methods
  • Sequence Analysis, RNA / methods
  • Single-Cell Analysis* / methods