Divergence and Shannon information in genomes

Phys Rev Lett. 2005 May 6;94(17):178103. doi: 10.1103/PhysRevLett.94.178103. Epub 2005 May 5.

Abstract

Shannon information (SI) and its special case, divergence, are defined for a DNA sequence in terms of probabilities of chemical words in the sequence and are computed for a set of complete genomes highly diverse in length and composition. We find the following: SI (but not divergence) is inversely proportional to sequence length for a random sequence but is length independent for genomes; the genomic SI is always greater and, for shorter words and longer sequences, hundreds to thousands times greater than the SI in a random sequence whose length and composition match those of the genome; genomic SIs appear to have word-length dependent universal values. The universality is inferred to be an evolution footprint of a universal mode for genome growth.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods*
  • DNA / chemistry
  • Databases, Genetic
  • Entropy
  • Evolution, Molecular
  • Genes, Bacterial
  • Genome*
  • Genome, Bacterial
  • Models, Statistical
  • Molecular Sequence Data
  • Sequence Analysis, DNA
  • Software
  • Thermodynamics

Substances

  • DNA