T2-DAG: a powerful test for differentially expressed gene pathways via graph-informed structural equation modeling

Bioinformatics. 2022 Jan 27;38(4):1005-1014. doi: 10.1093/bioinformatics/btab770.

Abstract

Motivation: A major task in genetic studies is to identify genes related to human diseases and traits to understand functional characteristics of genetic mutations and enhance patient diagnosis. Compared with marginal analyses of individual genes, identification of gene pathways, i.e. a set of genes with known interactions that collectively contribute to specific biological functions, can provide more biologically meaningful results. Such gene pathway analysis can be formulated into a high-dimensional two-sample testing problem. Given the typically limited sample size of gene expression datasets, most existing two-sample tests tend to have compromised powers because they ignore or only inefficiently incorporate the auxiliary pathway information on gene interactions.

Results: We propose T2-DAG, a Hotelling's T2-type test for detecting differentially expressed gene pathways, which efficiently leverages the auxiliary pathway information on gene interactions from existing pathway databases through a linear structural equation model. We further establish its asymptotic distribution under pertinent assumptions. Simulation studies under various scenarios show that T2-DAG outperforms several representative existing methods with well-controlled type-I error rates and substantially improved powers, even with incomplete or inaccurate pathway information or unadjusted confounding effects. We also illustrate the performance of T2-DAG in an application to detect differentially expressed KEGG pathways between different stages of lung cancer.

Availability and implementation: The R (R Development Core Team, 2021) package T2DAG which implements the proposed T2-DAG test is available on Github at https://github.com/Jin93/T2DAG.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Computer Simulation
  • Databases, Factual
  • Humans
  • Latent Class Analysis*
  • Phenotype
  • Sample Size