Bacterial promoter prediction: Selection of dynamic and static physical properties of DNA for reliable sequence classification

J Bioinform Comput Biol. 2018 Feb;16(1):1840003. doi: 10.1142/S0219720018400036. Epub 2018 Jan 30.

Abstract

Predicting promoter activity of DNA fragment is an important task for computational biology. Approaches using physical properties of DNA to predict bacterial promoters have recently gained a lot of attention. To select an adequate set of physical properties for training a classifier, various characteristics of DNA molecule should be taken into consideration. Here, we present a systematic approach that allows us to select less correlated properties for classification by means of both correlation and cophenetic coefficients as well as concordance matrices. To prove this concept, we have developed the first classifier that uses not only sequence and static physical properties of DNA fragment, but also dynamic properties of DNA open states. Therefore, the best performing models with accuracy values up to 90% for all types of sequences were obtained. Furthermore, we have demonstrated that the classifier can serve as a reliable tool enabling promoter DNA fragments to be distinguished from promoter islands despite the similarity of their nucleotide sequences.

Keywords: DNA physical properties; Machine learning; promoter recognition.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods*
  • DNA, Bacterial / chemistry
  • DNA, Bacterial / classification*
  • DNA, Bacterial / genetics
  • Escherichia coli K12 / genetics*
  • Genome, Bacterial
  • Promoter Regions, Genetic*
  • Static Electricity

Substances

  • DNA, Bacterial