A high-quality reference genome of wild Cannabis sativa

Hortic Res. 2020 May 2;7(1):73. doi: 10.1038/s41438-020-0295-3. eCollection 2020.

Abstract

Cannabis sativa is a well-known plant species that has great economic and ecological significance. An incomplete genome of cloned C. sativa was obtained by using SOAPdenovo software in 2011. To further explore the utilization of this plant resource, we generated an updated draft genome sequence for wild-type varieties of C. sativa in China using PacBio single-molecule sequencing and Hi-C technology. Our assembled genome is approximately 808 Mb, with scaffold and contig N50 sizes of 83.00 Mb and 513.57 kb, respectively. Repetitive elements account for 74.75% of the genome. A total of 38,828 protein-coding genes were annotated, 98.20% of which were functionally annotated. We provide the first comprehensive de novo genome of wild-type varieties of C. sativa distributed in Tibet, China. Due to long-term growth in the wild environment, these varieties exhibit higher heterozygosity and contain more genetic information. This genetic resource is of great value for future investigations of cannabinoid metabolic pathways and will aid in promoting the commercial production of C. sativa and the effective utilization of cannabinoids. The assembled genome is also a valuable resource for intensively and effectively investigating the C. sativa genome further in the future.

Keywords: Plant breeding; Plant evolution; Structural variation.