pDHS-SVM: A prediction method for plant DNase I hypersensitive sites based on support vector machine

J Theor Biol. 2017 Aug 7:426:126-133. doi: 10.1016/j.jtbi.2017.05.030. Epub 2017 May 26.

Abstract

DNase I hypersensitive sites (DHSs) are accessible chromatin regions hypersensitive to cleavages by DNase I endonucleases. DHSs are indicative of cis-regulatory DNA elements (CREs), all of which play important roles in global gene expression regulation. It is helpful for discovering CREs by recognition of DHSs in genome. To accelerate the investigation, it is an important complement to develop cost-effective computational methods to identify DHSs. However, there is a lack of tools used for identifying DHSs in plant genome. Here we presented pDHS-SVM, a computational predictor to identify plant DHSs. To integrate the global sequence-order information and local DNA properties, reverse complement kmer and dinucleotide-based auto covariance of DNA sequences were applied to construct the feature space. In this work, fifteen physical-chemical properties of dinucleotides were used and Support Vector Machine (SVM) was employed. To further improve the performance of the predictor and extract an optimized subset of nucleotide physical-chemical properties positive for the DHSs, a heuristic nucleotide physical-chemical property selection algorithm was introduced. With the optimized subset of properties, experimental results of Arabidopsis thaliana and rice (Oryza sativa) showed that pDHS-SVM could achieve accuracies up to 87.00%, and 85.79%, respectively. The results indicated the effectiveness of proposed method for predicting DHSs. Furthermore, pDHS-SVM could provide a helpful complement for predicting CREs in plant genome. Our implementation of the novel proposed method pDHS-SVM is freely available as source code, at https://github.com/shanxinzhang/pDHS-SVM.

Keywords: DHS; Dinucleotide-based auto covariance; Prediction; SVM.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Arabidopsis
  • Binding Sites
  • Deoxyribonuclease I / metabolism*
  • Dinucleotide Repeats
  • Genome, Plant / genetics*
  • Oryza
  • Regulatory Sequences, Nucleic Acid / genetics
  • Support Vector Machine*

Substances

  • Deoxyribonuclease I