scESI: evolutionary sparse imputation for single-cell transcriptomes from nearest neighbor cells

Brief Bioinform. 2022 Sep 20;23(5):bbac144. doi: 10.1093/bib/bbac144.

Abstract

The ubiquitous dropout problem in single-cell RNA sequencing technology causes a large amount of data noise in the gene expression profile. For this reason, we propose an evolutionary sparse imputation (ESI) algorithm for single-cell transcriptomes, which constructs a sparse representation model based on gene regulation relationships between cells. To solve this model, we design an optimization framework based on nondominated sorting genetics. This framework takes into account the topological relationship between cells and the variety of gene expression to iteratively search the global optimal solution, thereby learning the Pareto optimal cell-cell affinity matrix. Finally, we use the learned sparse relationship model between cells to improve data quality and reduce data noise. In simulated datasets, scESI performed significantly better than benchmark methods with various metrics. By applying scESI to real scRNA-seq datasets, we discovered scESI can not only further classify the cell types and separate cells in visualization successfully but also improve the performance in reconstructing trajectories differentiation and identifying differentially expressed genes. In addition, scESI successfully recovered the expression trends of marker genes in stem cell differentiation and can discover new cell types and putative pathways regulating biological processes.

Keywords: imputation; multiobjective evolutionary algorithm; single-cell RNA-seq; sparse representation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cluster Analysis
  • Exome Sequencing
  • Gene Expression Profiling
  • Sequence Analysis, RNA / methods
  • Single-Cell Analysis* / methods
  • Transcriptome*