Four basic symmetry types in the universal 7-cluster structure of microbial genomic sequences

In Silico Biol. 2005;5(3):265-82. Epub 2005 Jan 30.

Abstract

Coding information is the main source of heterogeneity (non-randomness) in the sequences of microbial genomes. The heterogeneity corresponds to a cluster structure in triplet distributions of relatively short genomic fragments (200-400 bp). We found a universal 7-cluster structure in microbial genomic sequences and explained its properties. We show that codon usage of bacterial genomes is a multi-linear function of their genomic G+C-content with high accuracy. Based on the analysis of 143 completely sequenced bacterial genomes available in Genbank in August 2004, we show that there are four "pure" types of the 7-cluster structure observed. All 143 cluster animated 3D-scatters are collected in a database which is made available on our web-site (http://www.ihes.fr/~zinovyev/7clusters). The findings can be readily introduced into software for gene prediction, sequence alignment or microbial genomes classification.

MeSH terms

  • Codon
  • Genome, Archaeal*
  • Genome, Bacterial*
  • Internet
  • Multigene Family*

Substances

  • Codon