Rapid hybrid de novo assembly of a microbial genome using only short reads: Corynebacterium pseudotuberculosis I19 as a case study

J Microbiol Methods. 2011 Aug;86(2):218-23. doi: 10.1016/j.mimet.2011.05.008. Epub 2011 May 18.

Abstract

Due to the advent of the so-called Next-Generation Sequencing (NGS) technologies the amount of monetary and temporal resources for whole-genome sequencing has been reduced by several orders of magnitude. Sequence reads can be assembled either by anchoring them directly onto an available reference genome (classical reference assembly), or can be concatenated by overlap (de novo assembly). The latter strategy is preferable because it tends to maintain the architecture of the genome sequence the however, depending on the NGS platform used, the shortness of read lengths cause tremendous problems the in the subsequent genome assembly phase, impeding closing of the entire genome sequence. To address the problem, we developed a multi-pronged hybrid de novo strategy combining De Bruijn graph and Overlap-Layout-Consensus methods, which was used to assemble from short reads the entire genome of Corynebacterium pseudotuberculosis strain I19, a bacterium with immense importance in veterinary medicine that causes Caseous Lymphadenitis in ruminants, principally ovines and caprines. Briefly, contigs were assembled de novo from the short reads and were only oriented using a reference genome by anchoring. Remaining gaps were closed using iterative anchoring of short reads by craning to gap flanks. Finally, we compare the genome sequence assembled using our hybrid strategy to a classical reference assembly using the same data as input and show that with the availability of a reference genome, it pays off to use the hybrid de novo strategy, rather than a classical reference assembly, because more genome sequences are preserved using the former.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods*
  • Corynebacterium pseudotuberculosis / genetics*
  • DNA, Bacterial / chemistry
  • DNA, Bacterial / genetics*
  • Genome, Bacterial*
  • High-Throughput Nucleotide Sequencing*
  • Sequence Analysis, DNA / methods*

Substances

  • DNA, Bacterial