Novel genetic matching methods for handling population stratification in genome-wide association studies

BMC Bioinformatics. 2015 Mar 14:16:84. doi: 10.1186/s12859-015-0521-4.

Abstract

Background: A usually confronted problem in association studies is the occurrence of population stratification. In this work, we propose a novel framework to consider population matchings in the contexts of genome-wide and sequencing association studies. We employ pairwise and groupwise optimal case-control matchings and present an agglomerative hierarchical clustering, both based on a genetic similarity score matrix. In order to ensure that the resulting matches obtained from the matching algorithm capture correctly the population structure, we propose and discuss two stratum validation methods. We also invent a decisive extension to the Cochran-Armitage Trend test to explicitly take into account the particular population structure.

Results: We assess our framework by simulations of genotype data under the null hypothesis, to affirm that it correctly controls for the type-1 error rate. By a power study we evaluate that structured association testing using our framework displays reasonable power. We compare our result with those obtained from a logistic regression model with principal component covariates. Using the principal components approaches we also find a possible false-positive association to Alzheimer's disease, which is neither supported by our new methods, nor by the results of a most recent large meta analysis or by a mixed model approach.

Conclusions: Matching methods provide an alternative handling of confounding due to population stratification for statistical tests for which covariates are hard to model. As a benchmark, we show that our matching framework performs equally well to state of the art models on common variants.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alzheimer Disease / genetics*
  • Case-Control Studies
  • Cluster Analysis*
  • Genetics, Population*
  • Genome-Wide Association Study / methods*
  • Genotype
  • Humans
  • Logistic Models*
  • Population Groups