Stratified Test Accurately Identifies Differentially Expressed Genes Under Batch Effects in Single-Cell Data

IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2072-2079. doi: 10.1109/TCBB.2021.3094650. Epub 2021 Dec 8.

Abstract

Analyzing single-cell sequencing data from large cohorts is challenging. Discrepancies across experiments and differences among participants often lead to omissions and false discoveries in differentially expressed genes. We find that the Van Elteren test, a stratified version of the widely used Wilcoxon rank-sum test, elegantly mitigates the problem. We also modified the common language effect size to supplement this test, further improving its utility. On both simulated and real patient data we show the ability of Van Elteren test to control for false positives and false negatives. A comprehensive assessment using receiver operating characteristic (ROC) curve shows that Van Elteren test achieves higher sensitivity and specificity on simulated datasets, compared with nine state-of-the-art differential expression analysis methods. The effect size also estimates the differences between cell types more accurately.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Computational Biology / methods*
  • Humans
  • Neoplasms / genetics
  • Neoplasms / metabolism
  • RNA-Seq / methods*
  • ROC Curve
  • Retina / cytology
  • Retina / metabolism
  • Single-Cell Analysis / methods*
  • Statistics, Nonparametric
  • Transcriptome / genetics