Haplotype based testing for a better understanding of the selective architecture

BMC Bioinformatics. 2023 Aug 26;24(1):322. doi: 10.1186/s12859-023-05437-3.

Abstract

Background: The identification of genomic regions affected by selection is one of the most important goals in population genetics. If temporal data are available, allele frequency changes at SNP positions are often used for this purpose. Here we provide a new testing approach that uses haplotype frequencies instead of allele frequencies.

Results: Using simulated data, we show that compared to SNP based test, our approach has higher power, especially when the number of candidate haplotypes is small or moderate. To improve power when the number of haplotypes is large, we investigate methods to combine them with a moderate number of haplotype subsets. Haplotype frequencies can often be recovered with less noise than SNP frequencies, especially under pool sequencing, giving our test an additional advantage. Furthermore, spurious outlier SNPs may lead to false positives, a problem usually not encountered when working with haplotypes. Post hoc tests for the number of selected haplotypes and for differences between their selection coefficients are also provided for a better understanding of the underlying selection dynamics. An application on a real data set further illustrates the performance benefits.

Conclusions: Due to less multiple testing correction and noise reduction, haplotype based testing is able to outperform SNP based tests in terms of power in most scenarios.

Keywords: Evolve and resequence; Experimental evolution; Haplotype; Hypothesis test; Post hoc test; Selection.

MeSH terms

  • Gene Frequency
  • Genomics*
  • Haplotypes
  • Polymorphism, Single Nucleotide*