GWIS--model-free, fast and exhaustive search for epistatic interactions in case-control GWAS

BMC Genomics. 2013;14 Suppl 3(Suppl 3):S10. doi: 10.1186/1471-2164-14-S3-S10. Epub 2013 May 28.

Abstract

Background: It has been hypothesized that multivariate analysis and systematic detection of epistatic interactions between explanatory genotyping variables may help resolve the problem of "missing heritability" currently observed in genome-wide association studies (GWAS). However, even the simplest bivariate analysis is still held back by significant statistical and computational challenges that are often addressed by reducing the set of analysed markers. Theoretically, it has been shown that combinations of loci may exist that show weak or no effects individually, but show significant (even complete) explanatory power over phenotype when combined. Reducing the set of analysed SNPs before bivariate analysis could easily omit such critical loci.

Results: We have developed an exhaustive bivariate GWAS analysis methodology that yields a manageable subset of candidate marker pairs for subsequent analysis using other, often more computationally expensive techniques. Our model-free filtering approach is based on classification using ROC curve analysis, an alternative to much slower regression-based modelling techniques. Exhaustive analysis of studies containing approximately 450,000 SNPs and 5,000 samples requires only 2 hours using a desktop CPU or 13 minutes using a GPU (Graphics Processing Unit). We validate our methodology with analysis of simulated datasets as well as the seven Wellcome Trust Case-Control Consortium datasets that represent a wide range of real life GWAS challenges. We have identified SNP pairs that have considerably stronger association with disease than their individual component SNPs that often show negligible effect univariately. When compared against previously reported results in the literature, our methods re-detect most significant SNP-pairs and additionally detect many pairs absent from the literature that show strong association with disease. The high overlap suggests that our fast analysis could substitute for some slower alternatives.

Conclusions: We demonstrate that the proposed methodology is robust, fast and capable of exhaustive search for epistatic interactions using a standard desktop computer. First, our implementation is significantly faster than timings for comparable algorithms reported in the literature, especially as our method allows simultaneous use of multiple statistical filters with low computing time overhead. Second, for some diseases, we have identified hundreds of SNP pairs that pass formal multiple test (Bonferroni) correction and could form a rich source of hypotheses for follow-up analysis.

Availability: A web-based version of the software used for this analysis is available at http://bioinformatics.research.nicta.com.au/gwis.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Algorithms*
  • Computational Biology / methods*
  • Computer Simulation
  • Epistasis, Genetic / genetics*
  • Genome-Wide Association Study / methods*
  • Humans
  • Models, Genetic*
  • Polymorphism, Single Nucleotide / genetics*
  • ROC Curve
  • Sensitivity and Specificity
  • Software*
  • Time Factors