Conservation vs. variation of dinucleotide frequencies across bacterial and archaeal genomes: evolutionary implications

Front Microbiol. 2013 Sep 6:4:269. doi: 10.3389/fmicb.2013.00269. eCollection 2013.

Abstract

During the long history of biological evolution, genome structures have undergone enormous changes. Nevertheless, some traits or vestiges of the primordial genome (defined as the most primitive nucleic acid genome for life on earth in this paper) may remain in modern genetic systems. It is of great importance to find these traits or vestiges for the study of the origin and evolution of genomes. As the shorter is a sequence, the less probable it would be modified during genome evolution. And if mutated, it would be easier to reappear at the same site or another site. Consequently, the genomic frequencies of very short nucleotide sequences, such as dinucleotides, would have considerable chances to be conserved during billions of years of evolution. Prokaryotic genomes are very diverse and with a wide range of GC content. Therefore, in order to find traits or vestiges of the primordial genome remained in modern genetic systems, we have studied the characteristics of dinucleotide frequencies across bacterial and archaeal genomes. We analyzed the dinucleotide frequency patterns of the whole-genome sequences from more than 1300 prokaryotic species (bacterial and archaeal genomes available as of December 2012). The results show that the frequencies of the dinucleotides AC, AG, CA, CT, GA, GT, TC, and TG are well-conserved across various genomes, while the frequencies of other dinucleotides vary considerably among species. The dinucleotide frequency conservation/variation pattern seems to correlate with the distributions of dinucleotides throughout a genome and across genomes. Further analysis indicates that the phenomenon would be determined by strand symmetry of genomic sequences (the second parity rule) and GC content variations among genomes. We discussed some possible origins of strand symmetry. And we propose that the phenomenon of frequency conservation of some dinucleotides may provide insights into the genomic composition of the primordial genetic system.

Keywords: GC content; compositional analysis; dinucleotide frequency; origin and evolution of genomes; primordial genome; strand symmetry; whole-genome sequences.