Shannon information in complete genomes

J Bioinform Comput Biol. 2005 Jun;3(3):587-608. doi: 10.1142/s0219720005001181.

Abstract

Shannon information in the genomes of all completely sequenced prokaryotes and eukaryotes are measured in word lengths of two to ten letters. It is found that in a scale-dependent way, the Shannon information in complete genomes are much greater than that in matching random sequences--thousands of times greater in the case of short words. Furthermore, with the exception of the 14 chromosomes of Plasmodium falciparum, the Shannon information in all available complete genomes belong to a universality class given by an extremely simple formula. The data are consistent with a model for genome growth composed of two main ingredients: random segmental duplications that increase the Shannon information in a scale-independent way, and random point mutations that preferentially reduces the larger-scale Shannon information. The inference drawn from the present study is that the large-scale and coarse-grained growth of genomes was selectively neutral and this suggests an independent corroboration of Kimura's neutral theory of evolution.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Chromosome Mapping / methods*
  • Computational Biology / methods
  • DNA / chemistry
  • DNA / genetics*
  • DNA Mutational Analysis / methods*
  • Evolution, Molecular*
  • Genetic Variation / genetics
  • Information Storage and Retrieval / methods*
  • Models, Genetic*
  • Models, Statistical
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*

Substances

  • DNA