iDHS-EL: identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework

Bioinformatics. 2016 Aug 15;32(16):2411-8. doi: 10.1093/bioinformatics/btw186. Epub 2016 Apr 8.

Abstract

Motivation: Regulatory DNA elements are associated with DNase I hypersensitive sites (DHSs). Accordingly, identification of DHSs will provide useful insights for in-depth investigation into the function of noncoding genomic regions.

Results: In this study, using the strategy of ensemble learning framework, we proposed a new predictor called iDHS-EL for identifying the location of DHS in human genome. It was formed by fusing three individual Random Forest (RF) classifiers into an ensemble predictor. The three RF operators were respectively based on the three special modes of the general pseudo nucleotide composition (PseKNC): (i) kmer, (ii) reverse complement kmer and (iii) pseudo dinucleotide composition. It has been demonstrated that the new predictor remarkably outperforms the relevant state-of-the-art methods in both accuracy and stability.

Availability and implementation: For the convenience of most experimental scientists, a web server for iDHS-EL is established at http://bioinformatics.hitsz.edu.cn/iDHS-EL, which is the first web-server predictor ever established for identifying DHSs, and by which users can easily get their desired results without the need to go through the mathematical details. We anticipate that IDHS-EL: will become a very useful high throughput tool for genome analysis.

Contact: bliu@gordonlifescience.org or bliu@insun.hit.edu.cn

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Algorithms*
  • DNA
  • Deoxyribonuclease I*
  • Genomics
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Nucleotides*
  • Sequence Analysis, DNA

Substances

  • Nucleotides
  • DNA
  • Deoxyribonuclease I