Genome scan methods against more complex models: when and how much should we trust them?

Mol Ecol. 2014 Apr;23(8):2006-19. doi: 10.1111/mec.12705. Epub 2014 Apr 5.

Abstract

The recent availability of next-generation sequencing (NGS) has made possible the use of dense genetic markers to identify regions of the genome that may be under the influence of selection. Several statistical methods have been developed recently for this purpose. Here, we present the results of an individual-based simulation study investigating the power and error rate of popular or recent genome scan methods: linear regression, Bayescan, BayEnv and LFMM. Contrary to previous studies, we focus on complex, hierarchical population structure and on polygenic selection. Additionally, we use a false discovery rate (FDR)-based framework, which provides an unified testing framework across frequentist and Bayesian methods. Finally, we investigate the influence of population allele frequencies versus individual genotype data specification for LFMM and the linear regression. The relative ranking between the methods is impacted by the consideration of polygenic selection, compared to a monogenic scenario. For strongly hierarchical scenarios with confounding effects between demography and environmental variables, the power of the methods can be very low. Except for one scenario, Bayescan exhibited moderate power and error rate. BayEnv performance was good under nonhierarchical scenarios, while LFMM provided the best compromise between power and error rate across scenarios. We found that it is possible to greatly reduce error rates by considering the results of all three methods when identifying outlier loci.

Keywords: Bayesian methods; adaptation; false discovery rate; genome scan; power simulation study.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bayes Theorem*
  • Computer Simulation
  • Data Interpretation, Statistical
  • Gene Frequency
  • Gene-Environment Interaction
  • Genetics, Population / methods*
  • Genotype
  • Linear Models
  • Models, Genetic*
  • Polymorphism, Single Nucleotide