scIAE: an integrative autoencoder-based ensemble classification framework for single-cell RNA-seq data

Brief Bioinform. 2022 Jan 17;23(1):bbab508. doi: 10.1093/bib/bbab508.

Abstract

Single-cell RNA sequencing (scRNA-seq) allows quantitative analysis of gene expression at the level of single cells, beneficial to study cell heterogeneity. The recognition of cell types facilitates the construction of cell atlas in complex tissues or organisms, which is the basis of almost all downstream scRNA-seq data analyses. Using disease-related scRNA-seq data to perform the prediction of disease status can facilitate the specific diagnosis and personalized treatment of disease. Since single-cell gene expression data are high-dimensional and sparse with dropouts, we propose scIAE, an integrative autoencoder-based ensemble classification framework, to firstly perform multiple random projections and apply integrative and devisable autoencoders (integrating stacked, denoising and sparse autoencoders) to obtain compressed representations. Then base classifiers are built on the lower-dimensional representations and the predictions from all base models are integrated. The comparison of scIAE and common feature extraction methods shows that scIAE is effective and robust, independent of the choice of dimension, which is beneficial to subsequent cell classification. By testing scIAE on different types of data and comparing it with existing general and single-cell-specific classification methods, it is proven that scIAE has a great classification power in cell type annotation intradataset, across batches, across platforms and across species, and also disease status prediction. The architecture of scIAE is flexible and devisable, and it is available at https://github.com/JGuan-lab/scIAE.

Keywords: classification framework; ensemble learning; feature extraction; integrative autoencoder; scRNA-seq data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Analysis*
  • Exome Sequencing
  • Gene Expression Profiling
  • RNA-Seq
  • Sequence Analysis, RNA
  • Single-Cell Analysis* / methods