Genome-wide computational prediction and analysis of core promoter elements across plant monocots and dicots

PLoS One. 2013 Oct 29;8(10):e79011. doi: 10.1371/journal.pone.0079011. eCollection 2013.

Abstract

Transcription initiation, essential to gene expression regulation, involves recruitment of basal transcription factors to the core promoter elements (CPEs). The distribution of currently known CPEs across plant genomes is largely unknown. This is the first large scale genome-wide report on the computational prediction of CPEs across eight plant genomes to help better understand the transcription initiation complex assembly. The distribution of thirteen known CPEs across four monocots (Brachypodium distachyon, Oryza sativa ssp. japonica, Sorghum bicolor, Zea mays) and four dicots (Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera, Glycine max) reveals the structural organization of the core promoter in relation to the TATA-box as well as with respect to other CPEs. The distribution of known CPE motifs with respect to transcription start site (TSS) exhibited positional conservation within monocots and dicots with slight differences across all eight genomes. Further, a more refined subset of annotated genes based on orthologs of the model monocot (O. sativa ssp. japonica) and dicot (A. thaliana) genomes supported the positional distribution of these thirteen known CPEs. DNA free energy profiles provided evidence that the structural properties of promoter regions are distinctly different from that of the non-regulatory genome sequence. It also showed that monocot core promoters have lower DNA free energy than dicot core promoters. The comparison of monocot and dicot promoter sequences highlights both the similarities and differences in the core promoter architecture irrespective of the species-specific nucleotide bias. This study will be useful for future work related to genome annotation projects and can inspire research efforts aimed to better understand regulatory mechanisms of transcription.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Arabidopsis / genetics
  • Base Sequence
  • Brachypodium
  • Computational Biology
  • Gene Expression Regulation, Plant*
  • Genes, Plant*
  • Genome, Plant*
  • Oryza / genetics
  • Plant Proteins / genetics
  • Populus / genetics
  • Promoter Regions, Genetic*
  • Sequence Analysis, DNA
  • Sorghum / genetics
  • TATA Box
  • Transcription Initiation Site
  • Zea mays / genetics

Substances

  • Plant Proteins

Grants and funding

This work is funded by USDA (1907-21000-030) and Gramene (DBI 0703908). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.