Conditions under which genome-wide association studies will be positively misleading

Genetics. 2010 Nov;186(3):1045-52. doi: 10.1534/genetics.110.121665. Epub 2010 Sep 2.

Abstract

Genome-wide association mapping is a popular method for using natural variation within a species to generate a genotype-phenotype map. Statistical association between an allele at a locus and the trait in question is used as evidence that variation at the locus is responsible for variation of the trait. Indirect association, however, can give rise to statistically significant results at loci unrelated to the trait. We use a haploid, three-locus, binary genetic model to describe the conditions under which these indirect associations become stronger than any of the causative associations in the organism--even to the point of representing the only associations present in the data. These indirect associations are the result of disequilibrium between multiple factors affecting a single trait. Epistasis and population structure can exacerbate the problem but are not required to create it. From a statistical point of view, indirect associations are true associations rather than the result of stochastic noise: they will not be ameliorated by increasing sampling size or marker density and can be reproduced in independent studies.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Arabidopsis / genetics*
  • Artifacts
  • Computer Simulation
  • Genome, Plant / genetics*
  • Genome-Wide Association Study / methods*
  • Geography
  • Models, Genetic
  • Phenotype
  • Polymorphism, Single Nucleotide / genetics