A universal deep neural network for in-depth cleaning of single-cell RNA-Seq data

Nat Commun. 2022 Apr 7;13(1):1901. doi: 10.1038/s41467-022-29576-y.

Abstract

Single cell RNA sequencing (scRNA-Seq) is being widely used in biomedical research and generated enormous volume and diversity of data. The raw data contain multiple types of noise and technical artifacts, which need thorough cleaning. Existing denoising and imputation methods largely focus on a single type of noise (i.e., dropouts) and have strong distribution assumptions which greatly limit their performance and application. Here we design and develop the AutoClass model, integrating two deep neural network components, an autoencoder, and a classifier, as to maximize both noise removal and signal retention. AutoClass is distribution agnostic as it makes no assumption on specific data distributions, hence can effectively clean a wide range of noise and artifacts. AutoClass outperforms the state-of-art methods in multiple types of scRNA-Seq data analyses, including data recovery, differential expression analysis, clustering analysis, and batch effect removal. Importantly, AutoClass is robust on key hyperparameter settings including bottleneck layer size, pre-clustering number and classifier weight. We have made AutoClass open source at: https://github.com/datapplab/AutoClass .

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Gene Expression Profiling* / methods
  • Neural Networks, Computer
  • RNA-Seq
  • Sequence Analysis, RNA / methods
  • Single-Cell Analysis* / methods