Evolution of transcription factor DNA binding sites

Gene. 2005 Mar 14;347(2):255-63. doi: 10.1016/j.gene.2004.12.013. Epub 2005 Feb 17.

Abstract

In bioinformatics, binding of transcription regulatory factors to the cognate binding sites is usually described by sequence-specific binding energy, which is estimated from a training sample of sites. This model implies that all binding sites with binding energy above some threshold are functional and site sequence variations should be considered neutral until they do not reduce this energy below the threshold. To quantify this energy, the binding profile (positional weight matrix, PWM) model or consensus-based model is usually applied. Here we show that in many cases available data are not sufficient to construct a relevant PWM, and modified consensus-based model could be more effective to describe binding properties. Further, using the data about binding sites of several transcription factors, we demonstrate that some non-consensus nucleotides in "orthologous sites" (that is, binding sites of the same factor upstream of orthologous genes), which have been believed to be irrelevant or even hindering the regulation, are evolutionary very stable and specific for the regulated gene. For each two considered genomes, the number of substitutions between non-consensus nucleotides is far less than the expected number of neutral substitutions. Moreover, in several positions of binding sites regulating different genes, there are non-consensus nucleotides conserved in distant genomes. It means that there exists a selection pressure, which results in the stability of non-consensus nucleotides.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • Binding Sites
  • Consensus Sequence
  • DNA / metabolism*
  • Evolution, Molecular*
  • Models, Biological
  • Prokaryotic Cells / physiology
  • Transcription Factors / genetics
  • Transcription Factors / metabolism*

Substances

  • Transcription Factors
  • DNA