Evolutionary conservation and functional implications of circular code motifs in eukaryotic genomes

Biosystems. 2019 Jan:175:57-74. doi: 10.1016/j.biosystems.2018.10.014. Epub 2018 Oct 24.

Abstract

A set X of 20 trinucleotides has been found to have the highest average occurrence in the reading frame, compared to the two shifted frames, of genes of bacteria, archaea, eukaryotes, plasmids and viruses (Michel, 2015, 2017; Arquès and Michel, 1996). This set X has an interesting mathematical property, since X is a maximal C3 self-complementary trinucleotide circular code (Arquès and Michel, 1996). Furthermore, any motif obtained from this circular code X has the capacity to retrieve, maintain and synchronize the reading frame in genes. In a recent study of the X motifs in the complete genome of the yeast, Saccharomyces cerevisiae, it was shown that they are significantly enriched in the reading frame of the genes (protein-coding regions) of the genome (Michel et al., 2017). It was suggested that these X motifs may be evolutionary relics of a primitive code originally used for gene translation. The aim of this paper is to address two questions: are X motifs conserved during evolution? and do they continue to play a functional role in the processes of genome decoding and protein production? In a large scale analysis involving complete genomes from four mammals and nine different yeast species, we highlight specific evolutionary pressures on the X motifs in the genes of all the genomes, and identify important new properties of X motif conservation at the level of the encoded amino acids. We then compare the occurrence of X motifs with existing experimental data concerning protein expression and protein production, and report a significant correlation between the number of X motifs in a gene and increased protein abundance. In a general way, this work suggests that motifs from circular codes, i.e. motifs having the property of reading frame retrieval, may represent functional elements located within the coding regions of extant genomes.

Keywords: Circular code motifs; Gene expression; Genetic code; Genome evolution.

MeSH terms

  • Algorithms*
  • Animals
  • Base Sequence
  • Eukaryota / genetics*
  • Eukaryota / physiology
  • Evolution, Molecular*
  • Genetic Code*
  • Genome*
  • Models, Genetic*
  • Nucleotide Motifs*
  • Sequence Homology