An efficient hierarchical generalized linear mixed model for pathway analysis of genome-wide association studies

Bioinformatics. 2011 Mar 1;27(5):686-92. doi: 10.1093/bioinformatics/btq728. Epub 2011 Jan 25.

Abstract

Motivation: In genome-wide association studies (GWAS) of complex diseases, genetic variants having real but weak associations often fail to be detected at the stringent genome-wide significance level. Pathway analysis, which tests disease association with combined association signals from a group of variants in the same pathway, has become increasingly popular. However, because of the complexities in genetic data and the large sample sizes in typical GWAS, pathway analysis remains to be challenging. We propose a new statistical model for pathway analysis of GWAS. This model includes a fixed effects component that models mean disease association for a group of genes, and a random effects component that models how each gene's association with disease varies about the gene group mean, thus belongs to the class of mixed effects models.

Results: The proposed model is computationally efficient and uses only summary statistics. In addition, it corrects for the presence of overlapping genes and linkage disequilibrium (LD). Via simulated and real GWAS data, we showed our model improved power over currently available pathway analysis methods while preserving type I error rate. Furthermore, using the WTCCC Type 1 Diabetes (T1D) dataset, we demonstrated mixed model analysis identified meaningful biological processes that agreed well with previous reports on T1D. Therefore, the proposed methodology provides an efficient statistical modeling framework for systems analysis of GWAS.

Availability: The software code for mixed models analysis is freely available at http://biostat.mc.vanderbilt.edu/LilyWang.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computer Simulation
  • Genome-Wide Association Study / methods*
  • Genotype
  • Humans
  • Linear Models*
  • Linkage Disequilibrium
  • Polymorphism, Single Nucleotide
  • Software*