Purging putative siblings from population genetic data sets: a cautionary view

Mol Ecol. 2017 Mar;26(5):1211-1224. doi: 10.1111/mec.14022. Epub 2017 Feb 6.

Abstract

Interest has surged recently in removing siblings from population genetic data sets before conducting downstream analyses. However, even if the pedigree is inferred correctly, this has the potential to do more harm than good. We used computer simulations and empirical samples of coho salmon to evaluate strategies for adjusting samples to account for family structure. We compared performance in full samples and sibling-reduced samples of estimators of allele frequency (P^), population differentiation (F^ST) and effective population size (N^e).

Results: (i) unless simulated samples included large family groups together with a component of unrelated individuals, removing siblings generally reduced precision of P^ and F^ST; (ii) N^e based on the linkage disequilibrium method was largely unbiased using full random samples but became increasingly upwardly biased under aggressive purging of siblings. Under nonrandom sampling (some families over-represented), N^e using full samples was downwardly biased; removing just the right 'Goldilocks' fraction of siblings could produce an unbiased estimate, but this sweet spot varied widely among scenarios; (iii) weighting individuals based on the inferred pedigree (to produce a best linear unbiased estimator, BLUE) maximized precision of P^ when the inferred pedigree was correct but performed poorly when the pedigree was wrong; (iv) a variant of sibling removal that leaves intact small sibling groups appears to be more robust to errors in inferences about family structure. Our results illustrate the complex challenges posed by presence of family structure, suggest that no single optimal solution exists and argue for caution in adjusting population genetic data sets for the presence of putative siblings without fully understanding the consequences.

Keywords: allele frequency; effective population size; family structure; genetic differentiation; precision.

Publication types

  • News

MeSH terms

  • Animals
  • Computer Simulation
  • Gene Frequency*
  • Genetics, Population*
  • Linkage Disequilibrium
  • Models, Genetic*
  • Pedigree
  • Salmon / genetics*
  • Siblings*