Enrichment of genomic resources and identification of simple sequence repeats from medicinally important Clausena excavata

3 Biotech. 2018 Mar;8(3):133. doi: 10.1007/s13205-018-1162-x. Epub 2018 Feb 15.

Abstract

To broaden and delve into the genomic information of Clausena excavata, an important medicinal plant in many Asian countries, RNA sequencing (RNA-seq) analysis was performed and a total of 16,638 non-redundant unigenes (≥ 300 bp) with an average length of 755 bp were generated by de novo assembly from 17,580,456 trimmed clear reads. The functional categorization of the identified unigenes by a gene ontology (GO) term resulted in 2305 genes in the cellular component, 5577 in the biological processes, and 8056 in the molecular functions, respectively. The top sub-category in biological processes was the metabolic process with 4374 genes. Among annotated genes, 3006 were mapped to 123 metabolic pathways by the Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic pathway analysis tool. The search for simple sequence repeats (SSRs) resulted in 845 SSRs from 749 SSR-containing unigenes and the most abundant SSR motifs was AAG/CTT with 179 occurrences. Twelve SSR markers were tested for cross transferability among five Clausena species; eight of them exhibited polymorphism. Taken together, these data provide valuable resources for genomic or genetic studies of Clausena species and other relative studies. The transcriptome shotgun assembly data have been deposited at DDBJ/EMBL/GenBank under the accession GGEM00000000.

Keywords: Clausena species; Medicinal plant; RNA-seq; SSR marker.