A novel sequence-based method of predicting protein DNA-binding residues, using a machine learning approach

Mol Cells. 2010 Aug;30(2):99-105. doi: 10.1007/s10059-010-0093-0. Epub 2010 Jul 23.

Abstract

Protein-DNA interactions play an essential role in transcriptional regulation, DNA repair, and many vital biological processes. The mechanism of protein-DNA binding, however, remains unclear. For the study of many diseases, researchers must improve their understanding of the amino acid motifs that recognize DNA. Because identifying these motifs experimentally is expensive and time-consuming, it is necessary to devise an approach for computational prediction. Some in silico methods have been developed, but there are still considerable limitations. In this study, we used a machine learning approach to develop a new sequence-based method of predicting protein-DNA binding residues. To make these predictions, we used the properties of the micro-environment of each amino acid from the AAIndex as well as conservation scores. Testing by the cross-validation method, we obtained an overall accuracy of 94.89%. Our method shows that the amino acid micro-environment is important for DNA binding, and that it is possible to identify the protein-DNA binding sites with it.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Artificial Intelligence*
  • DNA / metabolism*
  • DNA-Binding Proteins / metabolism*
  • Protein Binding
  • Sequence Analysis, Protein / methods*

Substances

  • DNA-Binding Proteins
  • DNA