A new inference method for detecting an ongoing selective sweep

Genes Genet Syst. 2018 Nov 10;93(4):149-161. doi: 10.1266/ggs.18-00008. Epub 2018 Sep 30.

Abstract

A simple method was developed to detect signatures of ongoing selective sweeps in single nucleotide polymorphism (SNP) data. Based largely on the traditional site frequency spectrum (SFS), the method additionally incorporates linkage disequilibrium (LD) between pairs of SNP sites and uniquely represents both SFS and LD information as hierarchical "barcodes." This barcode representation allows the identification of a hitchhiking genomic region surrounding a putative target site of positive selection, or a core site. Sweep signals at linked neutral sites are then measured by the proportion (Fc) of derived alleles within the hitchhiking region that are linked in the derived allele group defined at the core site. In measuring Fc or intra-allelic variability in an informative way, certain conditions for derived allele frequencies are required, as illustrated with the human ST8SIA2 locus. Coalescent simulators with and without positive selection are used to assess the false-positive and false-negative rates of the Fc statistic. To demonstrate its power, the method was further applied to the LCT, OCA2, EDAR, SLC24A5 and ASPM loci, which are known to have undergone positive selection in human populations. Overall, the method is powerful and can be used to identify core sites responsible for ongoing selective sweeps.

Keywords: hitchhiking; human evolution; linkage disequilibrium; population genomics; site frequency spectrum.

MeSH terms

  • Genome, Human
  • Genome-Wide Association Study / methods*
  • Genome-Wide Association Study / standards
  • Humans
  • Linkage Disequilibrium
  • Models, Genetic*
  • Polymorphism, Single Nucleotide
  • Selection, Genetic*
  • Sensitivity and Specificity
  • Sialyltransferases / genetics

Substances

  • CMP-N-acetylneuraminate-poly-alpha-2,8-sialosyl sialyltransferase
  • Sialyltransferases