Visualization of consensus genome structure without using a reference genome

BMC Genomics. 2017 Mar 14;18(Suppl 2):208. doi: 10.1186/s12864-017-3499-7.

Abstract

Background: Standard graphical tools for whole genome comparison require a reference genome. However, any reference is also subject to annotation biases and rearrangements, and may not serve as the standard except for those of extensively studied model species. To fully exploit the rapidly accumulating sequence data from the recent sequencing technologies, genome comparison without any reference has been anticipated.

Results: We introduce a circular genome visualizer to compare complete genomes of closely related species. This tool visualizes the position of orthologous gene clusters rather than actual sequences or their features, thereby achieving the comparative view without using a single reference genome. The essential information is the matrix of orthologous gene clusters whose positions (not sequences) are color-coded in circular graphics. As a demonstration, comparison of 14 Lactobacillus paracasei strains and one L. casei strain revealed not only large-scale rearrangements but also genomic islands that are strain-specific. Comparison of 73 Helicobacter pylori strains confirmed their genetic consistency and also revealed the three general patterns of large-scale genome inversions.

Conclusions: From the ample sequence information in the GenBank/ENA/DDBJ repository, we can reconstruct a genomic consensus for particular species. By visualizing multiple strains at a glance, we can identify conserved as well as strain-specific regions in multiply sequenced genomes. Positional consistency for orthologous genes provides information orthogonal to major sequence features such as the GC content or sequence similarity of marker genes. The positional comparison is therefore useful for identifying large-scale genome rearrangements or gene transfers.

Keywords: Circular visualization; Comparative genomics; Helicobacter pylori; Lactobacillus casei.

MeSH terms

  • Base Composition
  • Chromosome Mapping / methods*
  • Computer Graphics
  • Databases, Nucleic Acid
  • Gene Rearrangement
  • Genome, Bacterial*
  • Genomic Islands
  • Helicobacter pylori / classification
  • Helicobacter pylori / genetics*
  • Lacticaseibacillus paracasei / classification
  • Lacticaseibacillus paracasei / genetics*
  • Multigene Family
  • Sequence Analysis, DNA / methods*