Repetitive element signature-based visualization, distance computation, and classification of 1766 microbial genomes

Genomics. 2015 Jul;106(1):30-42. doi: 10.1016/j.ygeno.2015.04.004. Epub 2015 Apr 23.

Abstract

The genomes of living organisms are populated with pleomorphic repetitive elements (REs) of varying densities. Our hypothesis that genomic RE landscapes are species/strain/individual-specific was implemented into the Genome Signature Imaging system to visualize and compute the RE-based signatures of any genome. Following the occurrence profiling of 5-nucleotide REs/words, the information from top-50 frequency words was transformed into a genome-specific signature and visualized as Genome Signature Images (GSIs), using a CMYK scheme. An algorithm for computing distances among GSIs was formulated using the GSIs' variables (word identity, frequency, and frequency order). The utility of the GSI-distance computation system was demonstrated with control genomes. GSI-based computation of genome-relatedness among 1766 microbes (117 archaea and 1649 bacteria) identified their clustering patterns; although the majority paralleled the established classification, some did not. The Genome Signature Imaging system, with its visualization and distance computation functions, enables genome-scale evolutionary studies involving numerous genomes with varying sizes.

Keywords: Genome distance; Genome signature; Genome visualization; Genome-scale classification; Microbial genomes; Repetitive element.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Cluster Analysis
  • DNA / chemistry
  • Evolution, Molecular
  • Genome, Archaeal*
  • Genome, Bacterial*
  • Genomics / methods*
  • Mutation, Missense
  • Repetitive Sequences, Nucleic Acid

Substances

  • DNA