pathVar: a new method for pathway-based interpretation of gene expression variability

PeerJ. 2017 May 23:5:e3334. doi: 10.7717/peerj.3334. eCollection 2017.

Abstract

Identifying the pathways that control a cellular phenotype is the first step to building a mechanistic model. Recent examples in developmental biology, cancer genomics, and neurological disease have demonstrated how changes in the variability of gene expression can highlight important genes that are under different degrees of regulatory control. Simple statistical tests exist to identify differentially-variable genes; however, methods for investigating how changes in gene expression variability in the context of pathways and gene sets are under-explored. Here we present pathVar, a new method that provides functional interpretation of gene expression variability changes at the level of pathways and gene sets. pathVar is based on a multinomial exact test, or an asymptotic Chi-squared test as a more computationally-efficient alternative. The method can be used for gene expression studies from any technology platform in all biological settings either with a single phenotypic group, or two-group comparisons. To demonstrate its utility, we applied the method to a diverse set of diseases, species and samples. Results from pathVar are benchmarked against analyses based on average expression and two methods of GSEA, and demonstrate that analyses using both statistics are useful for understanding transcriptional regulation. We also provide recommendations for the choice of variability statistic that have been informed through analyses on simulations and real data. Based on the datasets selected, we show how pathVar can be used to gain insight into expression variability of single cell versus bulk samples, different stem cell populations, and cancer versus normal tissue comparisons.

Keywords: Bioinformatics; Cellular heterogeneity; Functional genomics; Gene expression variability; Single cell analysis; Transcriptional regulation.

Grants and funding

SZ and JCM were supported by the New York State Department of Health (NYSTEM Program) shared facility grant (C029154). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.