Haplotype-resolved assembly of auto-polyploid genomes via combining Hi-C and gametic data

Sci Rep. 2024 Apr 3;14(1):7892. doi: 10.1038/s41598-024-58623-5.

Abstract

Haplotype-resolved genome assembly plays a crucial role in understanding allele-specific functions. However, obtaining haplotype-resolved assembly for auto-polyploid genomes remains challenging. Existing methods can be classified into reference-based phasing, assembly-based phasing, and gamete binning. Nevertheless, there is a lack of cost-effective and efficient methods for haplotyping auto-polyploid genomes. In this study, we propose a novel phasing algorithm called PolyGH, which combines Hi-C and gametic data. We conducted experiments on tetraploid potato cultivars and divided the method into three steps. Firstly, gametic data was utilized to bin non-collapsed contigs, followed by merging adjacent fragments of the same type within the same contig. Secondly, accurate Hi-C signals related to differential genomic regions were acquired using unique k-mers. Finally, collapsed fragments were assigned to haplotigs based on combined Hi-C and gametic signals. Comparing PolyGH with Hi-C-based and gametic data-based methods, we found that PolyGH exhibited superior performance in haplotyping auto-polyploid genomes when integrating both data types. This approach has the potential to enhance haplotype-resolved assembly for auto-polyploid genomes.

Keywords: Auto-polyploid; Gametic data; Haplotype-resolved assembly; Hi-C; PacBio HiFi.

MeSH terms

  • Alleles
  • Germ Cells*
  • Haplotypes / genetics
  • Humans
  • Polyploidy*
  • Sequence Analysis, DNA / methods