The evolution of DNA regulatory regions for proteo-gamma bacteria by interspecies comparisons

Genome Res. 2002 Feb;12(2):298-308. doi: 10.1101/gr.207502.

Abstract

The comparison of homologous noncoding DNA for organisms a suitable evolutionary distance apart is a powerful tool for the identification of cis regulatory elements for transcription and translation and for the study of how they assemble into functional modules. We have fit the three parameters of an affine global probabilistic alignment algorithm to establish the background mutation rate of noncoding sequence between E. coli and a series of gamma proteobacteria ranging from Salmonella to Vibrio. The lower bound we find to the neutral mutation rate is sufficiently high, even for Salmonella, that most of the conservation of noncoding sequence is indicative of selective pressures rather than of insufficient time to evolve. We then use a local version of the alignment algorithm combined with our inferred background mutation rate to assign a significance to the degree of local sequence conservation between orthologous genes, and thereby deduce a probability profile for the upstream regulatory region of all E. coli protein-coding genes. We recover 75%-85% (depending on significance level) of all regulatory sites from a standard compilation for E. coli, and 66%-85% of sigma sites. We also trace the evolution of known regulatory sites and the groups associated with a given transcription factor. Furthermore, we find that approximately one-third of paralogous gene pairs in E. coli have a significant degree of correlation in their regulatory sequence. Finally, we demonstrate an inverse correlation between the rate of evolution of transcription factors and the number of genes they regulate. Our predictions are available at http://www.physics.rockefeller.edu/([tilde-see text])siggia.

Publication types

  • Comparative Study

MeSH terms

  • Algorithms
  • Binding Sites / genetics
  • DNA, Bacterial / genetics*
  • Evolution, Molecular*
  • Gammaproteobacteria / genetics*
  • Genes, Bacterial
  • Regulatory Sequences, Nucleic Acid / genetics*
  • Regulatory Sequences, Nucleic Acid / physiology
  • Species Specificity
  • Transcription Factors / genetics

Substances

  • DNA, Bacterial
  • Transcription Factors