The empirical codon mutation matrix as a communication channel

BMC Bioinformatics. 2014 Mar 22:15:80. doi: 10.1186/1471-2105-15-80.

Abstract

Background: A number of evolutionary models have been widely used for sequence alignment, phylogenetic tree reconstruction, and database searches. These models focus on how sets of independent substitutions between amino acids or codons derive one protein sequence from its ancestral sequence during evolution. In this paper, we regard the Empirical Codon Mutation (ECM) Matrix as a communication channel and compute the corresponding channel capacity.

Results: The channel capacity of 4.1875 bit, which is needed to preserve the information determined by the amino acid distribution, is obtained with an exponential factor of 0.26 applied to the ECM matrix. Additionally, we have obtained the optimum capacity achieving codon distribution. Compared to the biological distribution, there is an obvious difference, however, the distribution among synonymous codons is preserved. More importantly, the results show that the biological codon distribution allows for a "transmission" at a rate very close to the capacity.

Conclusion: We computed an exponential factor for the ECM matrix that would still allow for preserving the genetic information given the redundancy that is present in the codon-to-amino acid mapping. This gives an insight how such a mutation matrix relates to the preservation of a species in an information-theoretic sense.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acids / chemistry
  • Amino Acids / genetics
  • Animals
  • Base Sequence
  • Codon
  • Humans
  • Models, Genetic
  • Mutation*
  • Probability
  • Sequence Alignment
  • Sequence Analysis, DNA / methods*

Substances

  • Amino Acids
  • Codon