De novo assembly of the Pseudomonas syringae pv. syringae B728a genome using Illumina/Solexa short sequence reads

FEMS Microbiol Lett. 2009 Feb;291(1):103-11. doi: 10.1111/j.1574-6968.2008.01441.x. Epub 2008 Dec 9.

Abstract

Illumina's Genome Analyzer generates ultra-short sequence reads, typically 36 nucleotides in length, and is primarily intended for resequencing. We tested the potential of this technology for de novo sequence assembly on the 6 Mbp genome of Pseudomonas syringae pv. syringae B728a with several freely available assembly software packages. Using an unpaired data set, velvet assembled >96% of the genome into contigs with an N50 length of 8289 nucleotides and an error rate of 0.33%. EDENA generated smaller contigs (N50 was 4192 nucleotides) and comparable error rates. SSAKE and VCAKE yielded shorter contigs with very high error rates. Assembly of paired-end sequence data carrying 400 bp inserts produced longer contigs (N50 up to 15 628 nucleotides), but with increased error rates (0.5%). Contig length and error rate were very sensitive to the choice of parameter values. Noncoding RNA genes were poorly resolved in de novo assemblies, while >90% of the protein-coding genes were assembled with 100% accuracy over their full length. This study demonstrates that, in practice, de novo assembly of 36-nucleotide reads can generate reasonably accurate assemblies from about 40 x deep sequence data sets. These draft assemblies are useful for exploring an organism's proteomic potential, at a very economic low cost.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chromosome Mapping / methods*
  • Computational Biology
  • Genome, Bacterial*
  • Pseudomonas syringae / genetics*
  • Sequence Analysis, DNA
  • Software