Persistent biases in the amino acid composition of prokaryotic proteins

Bioessays. 2006 Jul;28(7):726-38. doi: 10.1002/bies.20431.

Abstract

Correspondence analysis of 28 proteomes selected to span the entire realm of prokaryotes revealed universal biases in the proteins' amino acid distribution. Integral Inner Membrane Proteins always form an individual cluster, which can then be used to predict protein localisation in unknown proteomes, independently of the organism's biotope or kingdom. Orphan proteins are consistently rich in aromatic residues. Another bias is also ubiquitous: the amino acid composition is driven by the G + C content of the first codon position. An unexpected bias is driven, in many proteomes, by the AAN box of the genetic code, suggesting some functional biochemical relationship between asparagine and lysine. Less-significant biases are driven by the rare amino acids, cysteine and tryptophan. Some allow identification of species-specific functions or localisation such as surface or exported proteins. Errors in genome annotations are also revealed by correspondence analysis, making it useful for quality control and correction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acids / chemistry*
  • Biochemical Phenomena
  • Biochemistry
  • Cell Membrane Structures / metabolism
  • Chemical Phenomena
  • Chemistry, Physical
  • DNA / genetics
  • Multigene Family
  • Phylogeny
  • Prokaryotic Cells / chemistry*
  • Prokaryotic Cells / metabolism*

Substances

  • Amino Acids
  • DNA