Cryptic population genetic structure: the number of inferred clusters depends on sample size

Mol Ecol Resour. 2010 Mar;10(2):314-23. doi: 10.1111/j.1755-0998.2009.02756.x. Epub 2009 Aug 11.

Abstract

Clustering methods have been used extensively to unravel cryptic population genetic structure. We investigated the effect of the number of individuals sampled in each location on the resulting number of clusters. Our study was motivated by recent results in Arabidopsis thaliana: studies in which more than one individual was sampled per location apparently have led to a much higher number of clusters than studies where only one individual was sampled in each location, as is generally done in this species. We show, using computer simulations and microsatellite data in A. thaliana, that the number of sampled individuals indeed has a strong impact on the number of resulting clusters. This effect is smaller if the sampled populations have a hierarchical structure. In most cases, sampling 5-10 individuals per population should be enough. The results argue for abandoning the concept of 'accessions' in partially selfing organisms.