Constraint on di-nucleotides by codon usage bias in bacterial genomes

Gene. 2014 Feb 15;536(1):18-28. doi: 10.1016/j.gene.2013.11.098. Epub 2013 Dec 11.

Abstract

It has been reported earlier that the relative di-nucleotide frequency (RDF) in different parts of a genome is similar while the frequency is variable among different genomes. So RDF is termed as genome signature in bacteria. It is not known if the constancy in RDF is governed by genome wide mutational bias or by selection. Here we did comparative analysis of RDF between the inter-genic and the coding sequences in seventeen bacterial genomes, whose gene expression data was available. The constraint on di-nucleotides was found to be higher in the coding sequences than that in the inter-genic regions and the constraint at the 2nd codon position was more than that in the 3rd position within a genome. Further analysis revealed that the constraint on di-nucleotides at the 2nd codon position is greater in the high expression genes (HEG) than that in the whole genomes as well as in the low expression genes (LEG). We analyzed RDF at the 2nd and the 3rd codon positions in simulated coding sequences that were computationally generated by keeping the codon usage bias (CUB) according to genome G+C composition and the sequence of amino acids unaltered. In the simulated coding sequences, the constraint observed was significantly low and no significant difference was observed between the HEG and the LEG in terms of di-nucleotide constraint. This indicated that the greater constraint on di-nucleotides in the HEG was due to the stronger selection on CUB in these genes in comparison to the LEG within a genome. Further, we did comparative analyses of the RDF in the HEG rpoB and rpoC of 199 bacteria, which revealed a common pattern of constraints on di-nucleotides at the 2nd codon position across these bacteria. To validate the role of CUB on di-nucleotide constraint, we analyzed RDF at the 2nd and the 3rd codon positions in simulated rpoB/rpoC sequences. The analysis revealed that selection on CUB is an important attribute for the constraint on di-nucleotides at these positions in bacterial genomes. We believe that this study has come with major findings of the role of CUB on di-nucleotide constraint in bacterial genomes.

Keywords: CUB; Codon context; Codon usage bias; Gene expression; HEG; Inter-genic region; LEG; RDF(s); Relative di-nucleotide frequency; codon usage bias; high expression genes; low expression genes; relative di-nucleotide frequency (ies).

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacterial Proteins / genetics
  • Base Composition / physiology*
  • Base Pairing / physiology*
  • Cluster Analysis
  • Codon / genetics*
  • Computer Simulation
  • DNA-Directed RNA Polymerases / genetics
  • Gene Expression
  • Genetic Code / physiology
  • Genome, Bacterial*
  • Nucleotides / genetics
  • Open Reading Frames / genetics

Substances

  • Bacterial Proteins
  • Codon
  • Nucleotides
  • DNA-Directed RNA Polymerases