Validating the significance of genomic properties of Chi sites from the distribution of all octamers in Escherichia coli

Gene. 2007 May 1;392(1-2):239-46. doi: 10.1016/j.gene.2006.12.022. Epub 2007 Jan 12.

Abstract

Chi sites (5'-GCTGGTGG-3') are homologous recombinational hotspot octamer sequences, which attenuate the exonuclease activity of RecBCD in Escherichia coli. They are overrepresented in the genome (1008 occurrences), preferentially located within coding regions (98%), oriented in the direction of replication (75%), and occur most commonly on the mRNA-synonymous sense strand of the double helix (79%). Previous statistical studies of the genome sequence suggested that these genomic properties of Chi sites appear to be related to their role in recombinational repair and therefore to replication and transcription. In this study, we employ three mathematical models to predict the properties of Chi sites from single nucleotide and multi-nucleotide compositions, and validate them statistically using the distribution of all octamer sequences in the entire genome, or exclusively within ORFs. The model based on the overall distribution of all octamers provided better predictions than the single nucleotide composition model, and the ORF and sense strand preference of Chi sites were shown to be within the standard deviation of all octamers. In contrast, the orientation bias of the Chi sites in the direction of replication was significant, although the bias was not as pronounced as with the single nucleotide composition model, suggesting a selective pressure related to the role of RecBCD in replication.

Publication types

  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Chromosome Mapping*
  • DNA Replication
  • Escherichia coli / genetics*
  • Genetic Markers*
  • Models, Theoretical
  • Open Reading Frames
  • RNA, Messenger / genetics
  • Recombination, Genetic / genetics
  • Statistical Distributions

Substances

  • Genetic Markers
  • RNA, Messenger