Leveraging data-driven self-consistency for high-fidelity gene expression recovery

Nat Commun. 2022 Nov 21;13(1):7142. doi: 10.1038/s41467-022-34595-w.

Abstract

Single cell RNA sequencing is a promising technique to determine the states of individual cells and classify novel cell subtypes. In current sequence data analysis, however, genes with low expressions are omitted, which leads to inaccurate gene counts and hinders downstream analysis. Recovering these omitted expression values presents a challenge because of the large size of the data. Here, we introduce a data-driven gene expression recovery framework, referred to as self-consistent expression recovery machine (SERM), to impute the missing expressions. Using a neural network, the technique first learns the underlying data distribution from a subset of the noisy data. It then recovers the overall expression data by imposing a self-consistency on the expression matrix, thus ensuring that the expression levels are similarly distributed in different parts of the matrix. We show that SERM improves the accuracy of gene imputation with orders of magnitude enhancement in computational efficiency in comparison to the state-of-the-art imputation techniques.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Gene Expression
  • Selective Estrogen Receptor Modulators*

Substances

  • Selective Estrogen Receptor Modulators