Fast and accurate joint inference of coancestry parameters for populations and/or individuals

PLoS Genet. 2023 Jan 19;19(1):e1010054. doi: 10.1371/journal.pgen.1010054. eCollection 2023 Jan.

Abstract

We introduce a fast, new algorithm for inferring from allele count data the FST parameters describing genetic distances among a set of populations and/or unrelated diploid individuals, and a tree with branch lengths corresponding to FST values. The tree can reflect historical processes of splitting and divergence, but seeks to represent the actual genetic variance as accurately as possible with a tree structure. We generalise two major approaches to defining FST, via correlations and mismatch probabilities of sampled allele pairs, which measure shared and non-shared components of genetic variance. A diploid individual can be treated as a population of two gametes, which allows inference of coancestry coefficients for individuals as well as for populations, or a combination of the two. A simulation study illustrates that our fast method-of-moments estimation of FST values, simultaneously for multiple populations/individuals, gains statistical efficiency over pairwise approaches when the population structure is close to tree-like. We apply our approach to genome-wide genotypes from the 26 worldwide human populations of the 1000 Genomes Project. We first analyse at the population level, then a subset of individuals and in a final analysis we pool individuals from the more homogeneous populations. This flexible analysis approach gives advantages over traditional approaches to population structure/coancestry, including visual and quantitative assessments of long-standing questions about the relative magnitudes of within- and between-population genetic differences.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Alleles
  • Computer Simulation
  • Genetics, Population*
  • Genotype
  • Humans

Grants and funding

This research was partially supported by grant DP210102168 from the Australian Research Council to DB, and by the “Investissement d’Avenir” project (Amaizing, ANR-10-BTBR-0001) to TMH. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.