Genome scan methods against more complex models: when and how much should we trust them?

Pierre de Villemereuil; Éric Frichot; Éric Bazin; Olivier François; Oscar E Gaggiotti

doi:10.1111/mec.12705

Genome scan methods against more complex models: when and how much should we trust them?

Mol Ecol. 2014 Apr;23(8):2006-19. doi: 10.1111/mec.12705. Epub 2014 Apr 5.

Authors

Pierre de Villemereuil¹, Éric Frichot, Éric Bazin, Olivier François, Oscar E Gaggiotti

Affiliation

¹ Centre National de la Recherche Scientifique, Université Jospeh Fourier, LECA, UMR 5553, 2233 rue de la piscine, 38400, Saint Martin d'Hères, France.

PMID: 24611968
DOI: 10.1111/mec.12705

Abstract

The recent availability of next-generation sequencing (NGS) has made possible the use of dense genetic markers to identify regions of the genome that may be under the influence of selection. Several statistical methods have been developed recently for this purpose. Here, we present the results of an individual-based simulation study investigating the power and error rate of popular or recent genome scan methods: linear regression, Bayescan, BayEnv and LFMM. Contrary to previous studies, we focus on complex, hierarchical population structure and on polygenic selection. Additionally, we use a false discovery rate (FDR)-based framework, which provides an unified testing framework across frequentist and Bayesian methods. Finally, we investigate the influence of population allele frequencies versus individual genotype data specification for LFMM and the linear regression. The relative ranking between the methods is impacted by the consideration of polygenic selection, compared to a monogenic scenario. For strongly hierarchical scenarios with confounding effects between demography and environmental variables, the power of the methods can be very low. Except for one scenario, Bayescan exhibited moderate power and error rate. BayEnv performance was good under nonhierarchical scenarios, while LFMM provided the best compromise between power and error rate across scenarios. We found that it is possible to greatly reduce error rates by considering the results of all three methods when identifying outlier loci.

Keywords: Bayesian methods; adaptation; false discovery rate; genome scan; power simulation study.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Bayes Theorem*
Computer Simulation
Data Interpretation, Statistical
Gene Frequency
Gene-Environment Interaction
Genetics, Population / methods*
Genotype
Linear Models
Models, Genetic*
Polymorphism, Single Nucleotide