Improved detection of motifs with preferential location in promoters

Genome. 2010 Sep;53(9):739-52. doi: 10.1139/g10-042.

Abstract

Many transcription factor binding sites (TFBSs) involved in gene expression regulation are preferentially located relative to the transcription start site. This property is exploited in in silico prediction approaches, one of which involves studying the local overrepresentation of motifs using a sliding window to scan promoters with considerable accuracy. Nevertheless, the consequences of the choice of the sliding window size have never before been analysed. We propose an automatic adaptation of this size to each motif distribution profile. This approach allows a better characterization of the topological constraints of the motifs and the lists of genes containing them. Moreover, our approach allowed us to highlight a nonconstant frequency of occurrence of spurious motifs that could be counter-selected close to their functional area. Therefore, to improve the accuracy of in silico prediction of TFBSs and the sensitivity of the promoter cartography, we propose, in addition to automatic adaptation of window size, consideration of the nonconstant frequency of motifs in promoters.

MeSH terms

  • Amino Acid Motifs
  • Arabidopsis / genetics*
  • Base Sequence / genetics
  • Binding Sites / genetics
  • DNA, Plant / genetics
  • Gene Expression Regulation, Plant
  • Genes, Plant
  • Genome, Plant
  • Promoter Regions, Genetic*
  • Protein Structure, Tertiary
  • Regulatory Sequences, Nucleic Acid
  • Repetitive Sequences, Nucleic Acid*
  • Sequence Analysis, DNA
  • Transcription Factors / chemistry*
  • Transcription Initiation Site

Substances

  • DNA, Plant
  • Transcription Factors