Bayesian variable and model selection methods for genetic association studies

Genet Epidemiol. 2009 Jan;33(1):27-37. doi: 10.1002/gepi.20353.

Abstract

Variable selection is growing in importance with the advent of high throughput genotyping methods requiring analysis of hundreds to thousands of single nucleotide polymorphisms (SNPs) and the increased interest in using these genetic studies to better understand common, complex diseases. Up to now, the standard approach has been to analyze the genotypes for each SNP individually to look for an association with a disease. Alternatively, combinations of SNPs or haplotypes are analyzed for association. Another added complication in studying complex diseases or phenotypes is that genetic risk for the disease is often due to multiple SNPs in various locations on the chromosome with small individual effects that may have a collectively large effect on the phenotype. Hence, multi-locus SNP models, as opposed to single SNP models, may better capture the true underlying genotypic-phenotypic relationship. Thus, innovative methods for determining which SNPs to include in the model are needed. The goal of this article is to describe several methods currently available for variable and model selection using Bayesian approaches and to illustrate their application for genetic association studies using both real and simulated candidate gene data for a complex disease. In particular, Bayesian model averaging (BMA), stochastic search variable selection (SSVS), and Bayesian variable selection (BVS) using a reversible jump Markov chain Monte Carlo (MCMC) for candidate gene association studies are illustrated using a study of age-related macular degeneration (AMD) and simulated data.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bayes Theorem*
  • Databases, Genetic
  • Genome-Wide Association Study / statistics & numerical data*
  • Humans
  • Macular Degeneration / epidemiology
  • Macular Degeneration / genetics
  • Markov Chains
  • Models, Genetic*
  • Models, Statistical*
  • Molecular Epidemiology
  • Monte Carlo Method
  • Polymorphism, Single Nucleotide
  • Stochastic Processes