Linked-read sequencing enables haplotype-resolved resequencing at population scale

Mol Ecol Resour. 2020 Sep;20(5):1311-1322. doi: 10.1111/1755-0998.13192. Epub 2020 Jun 29.

Abstract

The feasibility to sequence entire genomes of virtually any organism provides unprecedented insights into the evolutionary history of populations and species. Nevertheless, many population genomic inferences - including the quantification and dating of admixture, introgression and demographic events, and inference of selective sweeps - are still limited by the lack of high-quality haplotype information. The newest generation of sequencing technology now promises significant progress. To establish the feasibility of haplotype-resolved genome resequencing at population scale, we investigated properties of linked-read sequencing data of songbirds of the genus Oenanthe across a range of sequencing depths. Our results based on the comparison of downsampled (25×, 20×, 15×, 10×, 7×, and 5×) with high-coverage data (46-68×) of seven bird genomes mapped to a reference suggest that phasing contiguities and accuracies adequate for most population genomic analyses can be reached already with moderate sequencing effort. At 15× coverage, phased haplotypes span about 90% of the genome assembly, with 50% and 90% of phased sequences located in phase blocks longer than 1.25-4.6 Mb (N50) and 0.27-0.72 Mb (N90). Phasing accuracy reaches beyond 99% starting from 15× coverage. Higher coverages yielded higher contiguities (up to about 7 Mb/1 Mb [N50/N90] at 25× coverage), but only marginally improved phasing accuracy. Phase block contiguity improved with input DNA molecule length; thus, higher-quality DNA may help keeping sequencing costs at bay. In conclusion, even for organisms with gigabase-sized genomes like birds, linked-read sequencing at moderate depth opens an affordable avenue towards haplotype-resolved genome resequencing at population scale.

Keywords: admixture; demography; introgression; phasing; population genomics; selective sweeps.

MeSH terms

  • Animals
  • Genetics, Population*
  • Genomics*
  • Haplotypes*
  • High-Throughput Nucleotide Sequencing
  • Sequence Analysis, DNA
  • Songbirds / genetics*