EST assembly supported by a draft genome sequence: an analysis of the Chlamydomonas reinhardtii transcriptome

Nucleic Acids Res. 2007;35(6):2074-83. doi: 10.1093/nar/gkm081. Epub 2007 Mar 13.

Abstract

Clustering and assembly of expressed sequence tags (ESTs) constitute the basis for most genomewide descriptions of a transcriptome. This approach is limited by the decline in sequence quality toward the end of each EST, impacting both sequence clustering and assembly. Here, we exploit the available draft genome sequence of the unicellular green alga Chlamydomonas reinhardtii to guide clustering and to correct errors in the ESTs. We have grouped all available EST and cDNA sequences into 12,063 ACEGs (assembly of contiguous ESTs based on genome) and generated 15,857 contigs of average length 934 nt. We predict that roughly 3000 of our contigs represent full-length transcripts. Compared to previous assemblies, ACEGs show extended contig length, increased accuracy and a reduction in redundancy. Because our assembly protocol also uses ESTs with no corresponding genomic sequences, it provides sequence information for genes interrupted by sequence gaps. Detailed analysis of randomly sampled ACEGs reveals several hundred putative cases of alternative splicing, many overlapping transcription units and new genes not identified by gene prediction algorithms. Our protocol, although developed for and tailored to the C. reinhardtii dataset, can be exploited by any eukaryotic genome project for which both a draft genome sequence and ESTs are available.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algal Proteins / genetics*
  • Algorithms
  • Animals
  • Chlamydomonas reinhardtii / genetics*
  • Contig Mapping
  • Expressed Sequence Tags / chemistry*
  • Genomics*
  • Models, Genetic
  • Transcription, Genetic

Substances

  • Algal Proteins