Sequence coverage required for accurate genotyping by sequencing in polyploid species

Mol Ecol Resour. 2022 May;22(4):1417-1426. doi: 10.1111/1755-0998.13558. Epub 2021 Dec 20.

Abstract

Polyploidy plays an important role in the evolution of eukaryotes, especially for flowering plants. Many of ecologically or agronomically important plant or crop species are polyploids, including sycamore maple (tetraploid), the world second and third largest food crops wheat (hexaploid) and potato (tetraploid) as well as economically important aquaculture animals such as Atlantic salmon and trout. The next generation sequencing data enables to allocate genotype at a sequence variant site, known as genotyping by sequencing (GBS). GBS has stimulated enormous interests in population based genomics studies in almost all diploid and many polyploid organisms. DNA sequence polymorphisms are codominant and thus fully informative about the underlying genotype at the polymorphic site, making GBS a straightforward task in diploids. However, sequence data may usually be uninformative in polyploid species, making GBS a far more challenging task in polyploids. This paper presents novel and rigorous statistical methods for predicting the number of sequence reads needed to ensure accurate GBS at a polymorphic site bared by the reads in polyploids and shows that a dozen of reads can ensure a probability of 95% to recover all constituent alleles of any tetraploid genotype but several hundreds of reads are needed to accurately uncover the genotype with probability confidence of 90%, subverting the proposition of GBS using low coverage sequence data in the literature. The theoretical prediction was tested by use of RAD-seq data from tetraploid potato cultivars. The paper provides polyploid experimentalists with theoretical guides and methods for designing and conducting their sequence-based studies.

Keywords: RAD-seq data; Solanum tuberosum L; genotyping by sequencing; polyploids; sequence coverage.

MeSH terms

  • Alleles
  • Diploidy
  • Genotype
  • Genotyping Techniques*
  • High-Throughput Nucleotide Sequencing* / methods
  • Plants* / genetics
  • Polyploidy*