Allele frequency spectra in structured populations: Novel-allele probabilities under the labelled coalescent

Theor Popul Biol. 2020 Jun:133:130-140. doi: 10.1016/j.tpb.2020.01.002. Epub 2020 Mar 3.

Abstract

We address the effect of population structure on key properties of the Ewens sampling formula. We use our previously-introduced inductive method for determining exact allele frequency spectrum (AFS) probabilities under the infinite-allele model of mutation and population structure for samples of arbitrary size. Fundamental to the sampling distribution is the novel-allele probability, the probability that given the pattern of variation in the present sample, the next gene sampled belongs to an as-yet-unobserved allelic class. Unlike the case for panmictic populations, the novel-allele probability depends on the AFS of the present sample. We derive a recursion that directly provides the marginal novel-allele probability across AFSs, obviating the need first to determine the probability of each AFS. Our explorations suggest that the marginal novel-allele probability tends to be greater for initial samples comprising fewer alleles and for sampling configurations in which the next-observed gene derives from a deme different from that of the majority of the present sample. Comparison to the efficient importance sampling proposals developed by De Iorio and Griffiths and colleagues indicates that their approximation for the novel-allele probability generally agrees with the true marginal, although it may tend to overestimate the marginal in cases in which the novel-allele probability is high and migration rates are low.

Keywords: Allele frequency spectrum; Coalescence; Ewens sampling formula; Importance sampling; Infinite-allele mutation; Population structure.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Alleles
  • Gene Frequency
  • Genetics, Population*
  • Models, Genetic*
  • Mutation
  • Probability