Is a genome a codeword of an error-correcting code?

PLoS One. 2012;7(5):e36644. doi: 10.1371/journal.pone.0036644. Epub 2012 May 23.

Abstract

Since a genome is a discrete sequence, the elements of which belong to a set of four letters, the question as to whether or not there is an error-correcting code underlying DNA sequences is unavoidable. The most common approach to answering this question is to propose a methodology to verify the existence of such a code. However, none of the methodologies proposed so far, although quite clever, has achieved that goal. In a recent work, we showed that DNA sequences can be identified as codewords in a class of cyclic error-correcting codes known as Hamming codes. In this paper, we show that a complete intron-exon gene, and even a plasmid genome, can be identified as a Hamming code codeword as well. Although this does not constitute a definitive proof that there is an error-correcting code underlying DNA sequences, it is the first evidence in this direction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Base Sequence / genetics*
  • DNA Repair / genetics*
  • Genome / genetics*
  • Lactococcus lactis / genetics
  • Models, Genetic*