Dataset of the first transcriptome assembly of the tree crop "yerba mate" (Ilex paraguariensis) and systematic characterization of protein coding genes

Data Brief. 2018 Feb 10:17:1036-1040. doi: 10.1016/j.dib.2018.02.015. eCollection 2018 Apr.

Abstract

This contribution contains data associated to the research article entitled "Exploring the genes of yerba mate (Ilex paraguariensis A. St.-Hil.) by NGS and de novo transcriptome assembly" (Debat et al., 2014) [1]. By means of a bioinformatic approach involving extensive NGS data analyses, we provide a resource encompassing the full transcriptome assembly of yerba mate, the first available reference for the Ilex L. genus. This dataset (Supplementary files 1 and 2) consolidates the transcriptome-wide assembled sequences of I. paraguariensis with further comprehensive annotation of the protein coding genes of yerba mate via the integration of Arabidopsis thaliana databases. The generated data is pivotal for the characterization of agronomical relevant genes in the tree crop yerba mate -a non-model species- and related taxa in Ilex. The raw sequencing data dissected here is available at DDBJ/ENA/GenBank (NCBI Resource Coordinators, 2016) [2] Sequence Read Archive (SRA) under the accession SRP043293 and the assembled sequences have been deposited at the Transcriptome Shotgun Assembly Sequence Database (TSA) under the accession GFHV00000000.