DNA duplex stability as discriminative characteristic for Escherichia coli σ(54)- and σ(28)- dependent promoter sequences

Biologicals. 2014 Jan;42(1):22-8. doi: 10.1016/j.biologicals.2013.10.001. Epub 2013 Oct 28.

Abstract

The advent of modern high-throughput sequencing has made it possible to generate vast quantities of genomic sequence data. However, the processing of this volume of information, including prediction of gene-coding and regulatory sequences remains an important bottleneck in bioinformatics research. In this work, we integrated DNA duplex stability into the repertoire of a Neural Network (NN) capable of predicting promoter regions with augmented accuracy, specificity and sensitivity. We took our method beyond a simplistic analysis based on a single sigma subunit of RNA polymerase, incorporating the six main sigma-subunits of Escherichia coli. This methodology employed successfully re-discovered known promoter sequences recognized by E. coli RNA polymerase subunits σ(24), σ(28), σ(32), σ(38), σ(54) and σ(70), with highlighted accuracies for σ(28)- and σ(54)- dependent promoter sequences (values obtained were 80% and 78.8%, respectively). Furthermore, the discrimination of promoters according to the σ factor made it possible to extract functional commonalities for the genes expressed by each type of promoter. The DNA duplex stability rises as a distinctive feature which improves the recognition and classification of σ(28)- and σ(54)- dependent promoter sequences. The findings presented in this report underscore the usefulness of including DNA biophysical parameters into NN learning algorithms to increase accuracy, specificity and sensitivity in promoter beyond what is accomplished based on sequence alone.

Keywords: DNA duplex stability; Neural networks; Promoter prediction.

MeSH terms

  • DNA, Bacterial / genetics*
  • Escherichia coli / genetics*
  • Promoter Regions, Genetic*
  • Sigma Factor / genetics*

Substances

  • DNA, Bacterial
  • Sigma Factor