Genomics of Compositae crops: reference transcriptome assemblies and evidence of hybridization with wild relatives

Mol Ecol Resour. 2014 Jan;14(1):166-77. doi: 10.1111/1755-0998.12163. Epub 2013 Sep 18.

Abstract

Although the Compositae harbours only two major food crops, sunflower and lettuce, many other species in this family are utilized by humans and have experienced various levels of domestication. Here, we have used next-generation sequencing technology to develop 15 reference transcriptome assemblies for Compositae crops or their wild relatives. These data allow us to gain insight into the evolutionary and genomic consequences of plant domestication. Specifically, we performed Illumina sequencing of Cichorium endivia, Cichorium intybus, Echinacea angustifolia, Iva annua, Helianthus tuberosus, Dahlia hybrida, Leontodon taraxacoides and Glebionis segetum, as well 454 sequencing of Guizotia scabra, Stevia rebaudiana, Parthenium argentatum and Smallanthus sonchifolius. Illumina reads were assembled using Trinity, and 454 reads were assembled using MIRA and CAP3. We evaluated the coverage of the transcriptomes using BLASTX analysis of a set of ultra-conserved orthologs (UCOs) and recovered most of these genes (88-98%). We found a correlation between contig length and read length for the 454 assemblies, and greater contig lengths for the 454 compared with the Illumina assemblies. This suggests that longer reads can aid in the assembly of more complete transcripts. Finally, we compared the divergence of orthologs at synonymous sites (Ks) between Compositae crops and their wild relatives and found greater divergence when the progenitors were self-incompatible. We also found greater divergence between pairs of taxa that had some evidence of postzygotic isolation. For several more distantly related congeners, such as chicory and endive, we identified a signature of introgression in the distribution of Ks values.

Keywords: Compositae; crop genomics; hybridization; introgression; transcriptome.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Asteraceae / genetics*
  • Computational Biology
  • High-Throughput Nucleotide Sequencing
  • Molecular Sequence Data
  • Nucleic Acid Hybridization*
  • Transcriptome*

Associated data

  • SRA/SRP020001