Representation and high-quality annotation of the Physcomitrella patens transcriptome demonstrates a high proportion of proteins involved in metabolism in mosses

Plant Biol (Stuttg). 2005 May;7(3):238-50. doi: 10.1055/s-2005-837578.

Abstract

To gain insight into the transcriptome of the well-used plant model system Physcomitrella patens, several EST sequencing projects have been undertaken. We have clustered, assembled, and annotated all publicly available EST and CDS sequences in order to represent the transcriptome of this non-seed plant. Here, we present our fully annotated knowledge resource for the Physcomitrella patens transcriptome, integrating annotation from the production process of the clustered sequences and from a high-quality annotation pipeline developed during this study. Each transcript is represented as an entity containing full annotations and GO term associations. The whole production, filtering, clustering, and annotation process is being modelled and results in seven datasets, representing the annotated Physcomitrella transcriptome from different perspectives. We were able to annotate 63.4 % of the 26 123 virtual transcripts. The transcript archetype, as covered by our clustered data, is compared to a compilation based on all available Physcomitrella full length CDS. The distribution of the gene ontology annotations (GOA) for the virtual transcriptome of Physcomitrella patens demonstrates consistency in the ratios of the core molecular functions among the plant GOA. However, the metabolism subcategory is over-represented in bryophytes as compared to seed plants. This observation can be taken as an indicator for the wealth of alternative metabolic pathways in moss in comparison to spermatophytes. All resources presented in this study have been made available to the scientific community through a suite of user-friendly web interfaces via www.cosmoss.org and form the basis for assembly and annotation of the moss genome, which will be sequenced in 2005.

MeSH terms

  • 3' Untranslated Regions / genetics
  • Bryopsida / genetics*
  • Bryopsida / metabolism
  • Databases, Nucleic Acid
  • Expressed Sequence Tags
  • Models, Biological
  • Open Reading Frames
  • Plant Proteins / genetics*
  • Plant Proteins / metabolism
  • Protein Biosynthesis
  • Transcription, Genetic*

Substances

  • 3' Untranslated Regions
  • Plant Proteins