iPro-WAEL: a comprehensive and robust framework for identifying promoters in multiple species

Nucleic Acids Res. 2022 Oct 14;50(18):10278-10289. doi: 10.1093/nar/gkac824.

Abstract

Promoters are consensus DNA sequences located near the transcription start sites and they play an important role in transcription initiation. Due to their importance in biological processes, the identification of promoters is significantly important for characterizing the expression of the genes. Numerous computational methods have been proposed to predict promoters. However, it is difficult for these methods to achieve satisfactory performance in multiple species. In this study, we propose a novel weighted average ensemble learning model, termed iPro-WAEL, for identifying promoters in multiple species, including Human, Mouse, E.coli, Arabidopsis, B.amyloliquefaciens, B.subtilis and R.capsulatus. Extensive benchmarking experiments illustrate that iPro-WAEL has optimal performance and is superior to the current methods in promoter prediction. The experimental results also demonstrate a satisfactory prediction ability of iPro-WAEL on cross-cell lines, promoters annotated by other methods and distinguishing between promoters and enhancers. Moreover, we identify the most important transcription factor binding site (TFBS) motif in promoter regions to facilitate the study of identifying important motifs in the promoter regions. The source code of iPro-WAEL is freely available at https://github.com/HaoWuLab-Bioinformatics/iPro-WAEL.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Arabidopsis* / genetics
  • Arabidopsis* / metabolism
  • Computational Biology* / methods
  • Humans
  • Mice
  • Promoter Regions, Genetic
  • Software
  • Transcription Factors / genetics
  • Transcription Factors / metabolism
  • Transcription Initiation Site

Substances

  • Transcription Factors