A telomere-to-telomere reference genome provides genetic insight into the pentacyclic triterpenoid biosynthesis in Chaenomeles speciosa

Hortic Res. 2023 Sep 14;10(10):uhad183. doi: 10.1093/hr/uhad183. eCollection 2023 Oct.

Abstract

Chaenomeles speciosa (2n = 34), a medicinal and edible plant in the Rosaceae, is commonly used in traditional Chinese medicine. To date, the lack of genomic sequence and genetic studies has impeded efforts to improve its medicinal value. Herein, we report the use of an integrative approach involving PacBio HiFi (third-generation) sequencing and Hi-C scaffolding to assemble a high-quality telomere-to-telomere genome of C. speciosa. The genome comprised 650.4 Mb with a contig N50 of 35.5 Mb. Of these, 632.3 Mb were anchored to 17 pseudo-chromosomes, in which 12, 4, and 1 pseudo-chromosomes were represented by a single contig, two contigs, and four contigs, respectively. Eleven pseudo-chromosomes had telomere repeats at both ends, and four had telomere repeats at a single end. Repetitive sequences accounted for 49.5% of the genome, while a total of 45 515 protein-coding genes have been annotated. The genome size of C. speciosa was relatively similar to that of Malus domestica. Expanded or contracted gene families were identified and investigated for their association with different plant metabolisms or biological processes. In particular, functional annotation characterized gene families that were associated with the biosynthetic pathway of oleanolic and ursolic acids, two abundant pentacyclic triterpenoids in the fruits of C. speciosa. Taken together, this telomere-to-telomere and chromosome-level genome of C. speciosa not only provides a valuable resource to enhance understanding of the biosynthesis of medicinal compounds in tissues, but also promotes understanding of the evolution of the Rosaceae.