Variance of allele balance calculated from low coverage sequencing data infers departure from a diploid state

BMC Bioinformatics. 2022 Apr 25;23(1):150. doi: 10.1186/s12859-022-04685-z.

Abstract

Background: Polyploidy and heterokaryosis are common and consequential genetic phenomena that increase the number of haplotypes in an organism and complicate whole-genome sequence analysis. Allele balance has been used to infer polyploidy and heterokaryosis in diverse organisms using read sets sequenced to greater than 50× whole-genome coverage. However, sequencing to adequate depth is costly if applied to multiple individuals or large genomes.

Results: We developed VCFvariance.pl to utilize the variance of allele balance to infer polyploidy and/or heterokaryosis at low sequence coverage. This analysis requires as little as 10× whole-genome coverage and reduces the allele balance profile down to a single value, which can be used to determine if an individual has two or more haplotypes. This approach was validated using simulated, synthetic, and authentic read sets from the oomycete species Bremia lactucae and Phytophthora infestans, the fungal species Saccharomyces cerevisiae, and the plant species Arabidopsis arenosa. This approach was deployed to determine that nine of 21 genotyped European race-type isolates of Bremia lactucae were inconsistent with diploidy and therefore likely heterokaryotic.

Conclusions: Variance of allele balance is a reliable metric to detect departures from a diploid state, including polyploidy, heterokaryosis, a mixed sample, or chromosomal copy number variation. Deploying this strategy is computationally inexpensive, can reduce the cost of sequencing by up to 80%, and used to test any organism.

Keywords: Allele frequency; Arabidopsis; Bremia lactucae; Heterokaryosis; Oomycete; Phytophthora; Polyploidy; Saccharomyces.

MeSH terms

  • Alleles
  • Arabidopsis* / genetics
  • DNA Copy Number Variations
  • Diploidy*
  • Haplotypes
  • Humans
  • Polyploidy