Sequence-based evaluation of promoter context for prediction of transcription start sites in Arabidopsis and rice

Sci Rep. 2022 Apr 28;12(1):6976. doi: 10.1038/s41598-022-11169-w.

Abstract

Genes are transcribed from transcription start sites (TSSs), and their position in a genome is strictly controlled to avoid mis-expression of undesired regions. In this study, we designed and developed a methodology for the evaluation of promoter context, which detects proximal promoter regions from - 200 to - 60 bp relative to a TSS, in Arabidopsis and rice genomes. The method positively evaluates spacer sequences and Regulatory Element Groups, but not core promoter elements like TATA boxes, and is able to predict the position of a TSS within a width of 200 bp. An important feature of the evaluation/prediction method is its independence of the core promoter elements, which was demonstrated by successful prediction of all the TATA, GA, and coreless types of promoters without notable differences in the accuracy of prediction. The positive relationship identified between the evaluation scores and gene expression levels suggests that this method is useful for the evaluation of promoter maturity.

MeSH terms

  • Arabidopsis* / genetics
  • Oryza* / genetics
  • Promoter Regions, Genetic
  • TATA Box / genetics
  • Transcription Initiation Site