pDHS-SVM: A prediction method for plant DNase I hypersensitive sites based on support vector machine

Shanxin Zhang; Zhiping Zhou; Xinmeng Chen; Yong Hu; Lindong Yang

doi:10.1016/j.jtbi.2017.05.030

pDHS-SVM: A prediction method for plant DNase I hypersensitive sites based on support vector machine

J Theor Biol. 2017 Aug 7:426:126-133. doi: 10.1016/j.jtbi.2017.05.030. Epub 2017 May 26.

Authors

Shanxin Zhang¹, Zhiping Zhou², Xinmeng Chen³, Yong Hu², Lindong Yang²

Affiliations

¹ Engineering Research Center of IoT Technology Applications (Ministry of Education), School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China. Electronic address: shanxinzhang@jiangnan.edu.cn.
² Engineering Research Center of IoT Technology Applications (Ministry of Education), School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China.
³ School of Science, Jiangnan University, Wuxi, Jiangsu 214122, China.

PMID: 28552554
DOI: 10.1016/j.jtbi.2017.05.030

Abstract

DNase I hypersensitive sites (DHSs) are accessible chromatin regions hypersensitive to cleavages by DNase I endonucleases. DHSs are indicative of cis-regulatory DNA elements (CREs), all of which play important roles in global gene expression regulation. It is helpful for discovering CREs by recognition of DHSs in genome. To accelerate the investigation, it is an important complement to develop cost-effective computational methods to identify DHSs. However, there is a lack of tools used for identifying DHSs in plant genome. Here we presented pDHS-SVM, a computational predictor to identify plant DHSs. To integrate the global sequence-order information and local DNA properties, reverse complement kmer and dinucleotide-based auto covariance of DNA sequences were applied to construct the feature space. In this work, fifteen physical-chemical properties of dinucleotides were used and Support Vector Machine (SVM) was employed. To further improve the performance of the predictor and extract an optimized subset of nucleotide physical-chemical properties positive for the DHSs, a heuristic nucleotide physical-chemical property selection algorithm was introduced. With the optimized subset of properties, experimental results of Arabidopsis thaliana and rice (Oryza sativa) showed that pDHS-SVM could achieve accuracies up to 87.00%, and 85.79%, respectively. The results indicated the effectiveness of proposed method for predicting DHSs. Furthermore, pDHS-SVM could provide a helpful complement for predicting CREs in plant genome. Our implementation of the novel proposed method pDHS-SVM is freely available as source code, at https://github.com/shanxinzhang/pDHS-SVM.

Keywords: DHS; Dinucleotide-based auto covariance; Prediction; SVM.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Arabidopsis
Binding Sites
Deoxyribonuclease I / metabolism*
Dinucleotide Repeats
Genome, Plant / genetics*
Oryza
Regulatory Sequences, Nucleic Acid / genetics
Support Vector Machine*

Substances

Deoxyribonuclease I