An efficient hierarchical generalized linear mixed model for pathway analysis of genome-wide association studies

Lily Wang; Peilin Jia; Russell D Wolfinger; Xi Chen; Britney L Grayson; Thomas M Aune; Zhongming Zhao

doi:10.1093/bioinformatics/btq728

An efficient hierarchical generalized linear mixed model for pathway analysis of genome-wide association studies

Bioinformatics. 2011 Mar 1;27(5):686-92. doi: 10.1093/bioinformatics/btq728. Epub 2011 Jan 25.

Authors

Lily Wang¹, Peilin Jia, Russell D Wolfinger, Xi Chen, Britney L Grayson, Thomas M Aune, Zhongming Zhao

Affiliation

¹ Department of Biostatistics, Vanderbilt University, Nashville, TN 37232, USA. lily.wang@vanderbilt.edu

Abstract

Motivation: In genome-wide association studies (GWAS) of complex diseases, genetic variants having real but weak associations often fail to be detected at the stringent genome-wide significance level. Pathway analysis, which tests disease association with combined association signals from a group of variants in the same pathway, has become increasingly popular. However, because of the complexities in genetic data and the large sample sizes in typical GWAS, pathway analysis remains to be challenging. We propose a new statistical model for pathway analysis of GWAS. This model includes a fixed effects component that models mean disease association for a group of genes, and a random effects component that models how each gene's association with disease varies about the gene group mean, thus belongs to the class of mixed effects models.

Results: The proposed model is computationally efficient and uses only summary statistics. In addition, it corrects for the presence of overlapping genes and linkage disequilibrium (LD). Via simulated and real GWAS data, we showed our model improved power over currently available pathway analysis methods while preserving type I error rate. Furthermore, using the WTCCC Type 1 Diabetes (T1D) dataset, we demonstrated mixed model analysis identified meaningful biological processes that agreed well with previous reports on T1D. Therefore, the proposed methodology provides an efficient statistical modeling framework for systems analysis of GWAS.

Availability: The software code for mixed models analysis is freely available at http://biostat.mc.vanderbilt.edu/LilyWang.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Computer Simulation
Genome-Wide Association Study / methods*
Genotype
Humans
Linear Models*
Linkage Disequilibrium
Polymorphism, Single Nucleotide
Software*

Abstract

Publication types

MeSH terms

Grants and funding