Two long read-based genome assembly and annotation of polyploidy woody plants, Hibiscus syriacus L. using PacBio and Nanopore platforms

Sci Data. 2023 Oct 18;10(1):713. doi: 10.1038/s41597-023-02631-z.

Abstract

Improvements in long read DNA sequencing and related techniques facilitated the generation of complex eukaryotic genomes. Despite these advances, the quality of constructed plant reference genomes remains relatively poor due to the large size of genomes, high content of repetitive sequences, and wide variety of ploidy. Here, we developed the de novo sequencing and assembly of high polyploid plant genome, Hibiscus syriacus, a flowering plant species of the Malvaceae family, using the Oxford Nanopore Technologies and Pacific Biosciences Sequel sequencing platforms. We investigated an efficient combination of high-quality and high-molecular-weight DNA isolation procedure and suitable assembler to achieve optimal results using long read sequencing data. We found that abundant ultra-long reads allow for large and complex polyploid plant genome assemblies with great recovery of repetitive sequences and error correction even at relatively low depth Nanopore sequencing data and polishing compared to previous studies. Collectively, our combination provides cost effective methods to improve genome continuity and quality compared to the previously reported reference genome by accessing highly repetitive regions. The application of this combination may enable genetic research and breeding of polyploid crops, thus leading to improvements in crop production.

Publication types

  • Dataset

MeSH terms

  • Genome, Plant*
  • Hibiscus* / genetics
  • High-Throughput Nucleotide Sequencing / methods
  • Nanopores*
  • Plant Breeding
  • Polyploidy
  • Sequence Analysis, DNA / methods