A scaffold analysis tool using mate-pair information in genome sequencing

J Biomed Biotechnol. 2008:2008:675741. doi: 10.1155/2008/675741.

Abstract

We have developed a Windows-based program, ConPath, as a scaffold analyzer. ConPath constructs scaffolds by ordering and orienting separate sequence contigs by exploiting the mate-pair information between contig-pairs. Our algorithm builds directed graphs from link information and traverses them to find the longest acyclic graphs. Using end read pairs of fixed-sized mate-pair libraries, ConPath determines relative orientations of all contigs, estimates the gap size of each adjacent contig pair, and reports wrong assembly information by validating orientations and gap sizes. We have utilized ConPath in more than 10 microbial genome projects, including Mannheimia succiniciproducens and Vibro vulnificus, where we verified contig assembly and identified several erroneous contigs using the four types of error defined in ConPath. Also, ConPath supports some convenient features and viewers that permit investigation of each contig in detail; these include contig viewer, scaffold viewer, edge information list, mate-pair list, and the printing of complex scaffold structures.

MeSH terms

  • Algorithms*
  • Base Pair Mismatch
  • Base Sequence
  • Contig Mapping / methods*
  • Molecular Sequence Data
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*
  • Software*