Prediction of DNase I hypersensitive sites in plant genome using multiple modes of pseudo components

Anal Biochem. 2018 May 15:549:149-156. doi: 10.1016/j.ab.2018.03.025. Epub 2018 Mar 28.

Abstract

DNase I hypersensitive sites (DHSs) are accessible chromatin zones hypersensitive to DNase I endonucleases in plant genome. DHSs have been used as markers for the presence of transcriptional regulatory elements. It is an important complement to develop computational methods to identify DHSs for discovering potential regulatory elements. To the best of our knowledge, several machine learning approaches have been proposed for the DHSs prediction, but there is still room for improvements. In this work, a new predictor called pDHS-WE was proposed for prediction of DHSs in plant genome by using weighted ensemble learning framework. Here, five classes of heterogeneous features were used to represent the sequences. Five random forest (RF) operators were constructed based on these five classes of features. The proposed pDHS-WE was formed by fusing the five individual RF classifiers into an ensemble predictor. Genetic algorithm was employed to obtain the weights of different classes of features. In the experiments, pDHS-WE obtained accuracy of 88.5%, sensitivity of 89.1%, specificity of 88.0%, and AUC of 0.958, which was more than 2.7%, 2%, 3.5% and 2.6% higher than state-of-the-art methods, respectively. The results suggested that pDHS-WE may become a useful tool for transcriptional regulatory elements analysis in plant genome.

Keywords: DNase I hypersensitive sites; Ensemble learning; Genetic algorithm; Heterogeneous features; Prediction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Arabidopsis / genetics*
  • Deoxyribonuclease I*
  • Genome, Plant*
  • Machine Learning*
  • Response Elements
  • Sequence Analysis, DNA / methods*

Substances

  • Deoxyribonuclease I