cDNA Library Enrichment of Full Length Transcripts for SMRT Long Read Sequencing

PLoS One. 2016 Jun 21;11(6):e0157779. doi: 10.1371/journal.pone.0157779. eCollection 2016.

Abstract

The utility of genome assemblies does not only rely on the quality of the assembled genome sequence, but also on the quality of the gene annotations. The Pacific Biosciences Iso-Seq technology is a powerful support for accurate eukaryotic gene model annotation as it allows for direct readout of full-length cDNA sequences without the need for noisy short read-based transcript assembly. We propose the implementation of the TeloPrime Full Length cDNA Amplification kit to the Pacific Biosciences Iso-Seq technology in order to enrich for genuine full-length transcripts in the cDNA libraries. We provide evidence that TeloPrime outperforms the commonly used SMARTer PCR cDNA Synthesis Kit in identifying transcription start and end sites in Arabidopsis thaliana. Furthermore, we show that TeloPrime-based Pacific Biosciences Iso-Seq can be successfully applied to the polyploid genome of bread wheat (Triticum aestivum) not only to efficiently annotate gene models, but also to identify novel transcription sites, gene homeologs, splicing isoforms and previously unidentified gene loci.

MeSH terms

  • Arabidopsis / genetics
  • Computer Systems*
  • DNA, Complementary / genetics
  • Data Curation
  • Databases, Genetic
  • Gene Library*
  • Genome, Plant
  • RNA, Messenger / genetics
  • RNA, Messenger / metabolism
  • Sequence Analysis, DNA / methods*
  • Transcription Initiation Site
  • Triticum / genetics

Substances

  • DNA, Complementary
  • RNA, Messenger

Grants and funding

This work was supported by the Max Planck Society. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.