A novel skew analysis reveals substitution asymmetries linked to genetic code GC-biases and PolIII a-subunit isoforms

DNA Res. 2016 Aug;23(4):353-63. doi: 10.1093/dnares/dsw021. Epub 2016 Jun 26.

Abstract

Strand biases reflect deviations from a null expectation of DNA evolution that assumes strand-symmetric substitution rates. Here, we present strong evidence that nearest-neighbour preferences are a strand-biased feature of bacterial genomes, indicating neighbour-dependent substitution asymmetries. To detect such asymmetries we introduce an alignment free index (relative abundance skews). The profiles of relative abundance skews along coding sequences can trace the phylogenetic relations of bacteria, suggesting that the patterns of neighbour-dependent substitution strand-biases are not common among different lineages, but are rather species-specific. Analysis of neighbour-dependent and codon-site skews sheds light on the origins of substitution asymmetries. Via a simple model we argue that the structure of the genetic code imposes position-dependent substitution strand-biases along coding sequences, as a response to GC mutation pressure. Thus, the organization of the genetic code per se can lead to an uneven distribution of nucleotides among different codon sites, even when requirements for specific codons and amino-acids are not accounted for. Moreover, our results suggest that strand-biases in replication fidelity of PolIII α-subunit induce substitution asymmetries, both neighbour-dependent and independent, on a genome scale. The role of DNA repair systems, such as transcription-coupled repair, is also considered.

Keywords: Chargaff’s second parity rule (PR2); GC mutational pressure; PolIII a-subunit isoforms; dinucleotide relative abundances (odds ratios); substitution strand-biases.

MeSH terms

  • Algorithms
  • Bacteria / genetics
  • Bacterial Proteins / genetics
  • Bacterial Proteins / metabolism
  • Base Pairing*
  • DNA Polymerase III / genetics
  • DNA Polymerase III / metabolism
  • GC Rich Sequence*
  • Genetic Code*
  • Genome, Bacterial*
  • Models, Genetic*
  • Mutation Rate
  • Mutation*

Substances

  • Bacterial Proteins
  • DNA Polymerase III