A novel sequence-based method of predicting protein DNA-binding residues, using a machine learning approach

Yudong Cai; Zhisong He; Xiaohe Shi; Xiangying Kong; Lei Gu; Lu Xie

doi:10.1007/s10059-010-0093-0

A novel sequence-based method of predicting protein DNA-binding residues, using a machine learning approach

Mol Cells. 2010 Aug;30(2):99-105. doi: 10.1007/s10059-010-0093-0. Epub 2010 Jul 23.

Authors

Yudong Cai¹, Zhisong He, Xiaohe Shi, Xiangying Kong, Lei Gu, Lu Xie

Affiliation

¹ Institute of System Biology, Shanghai University, Shanghai, 200244, People's Republic of China. cai_yud@yahoo.com.cn

PMID: 20706794
DOI: 10.1007/s10059-010-0093-0

Abstract

Protein-DNA interactions play an essential role in transcriptional regulation, DNA repair, and many vital biological processes. The mechanism of protein-DNA binding, however, remains unclear. For the study of many diseases, researchers must improve their understanding of the amino acid motifs that recognize DNA. Because identifying these motifs experimentally is expensive and time-consuming, it is necessary to devise an approach for computational prediction. Some in silico methods have been developed, but there are still considerable limitations. In this study, we used a machine learning approach to develop a new sequence-based method of predicting protein-DNA binding residues. To make these predictions, we used the properties of the micro-environment of each amino acid from the AAIndex as well as conservation scores. Testing by the cross-validation method, we obtained an overall accuracy of 94.89%. Our method shows that the amino acid micro-environment is important for DNA binding, and that it is possible to identify the protein-DNA binding sites with it.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Amino Acid Sequence
Artificial Intelligence*
DNA / metabolism*
DNA-Binding Proteins / metabolism*
Protein Binding
Sequence Analysis, Protein / methods*

Substances

DNA-Binding Proteins
DNA