A Sequence-Based Dynamic Ensemble Learning System for Protein Ligand-Binding Site Prediction

IEEE/ACM Trans Comput Biol Bioinform. 2016 Sep-Oct;13(5):901-912. doi: 10.1109/TCBB.2015.2505286. Epub 2015 Dec 3.

Abstract

Background: Proteins have the fundamental ability to selectively bind to other molecules and perform specific functions through such interactions, such as protein-ligand binding. Accurate prediction of protein residues that physically bind to ligands is important for drug design and protein docking studies. Most of the successful protein-ligand binding predictions were based on known structures. However, structural information is not largely available in practice due to the huge gap between the number of known protein sequences and that of experimentally solved structures.

Results: This paper proposes a dynamic ensemble approach to identify protein-ligand binding residues by using sequence information only. To avoid problems resulting from highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we constructed several balanced data sets and we trained a random forest classifier for each of them. We dynamically selected a subset of classifiers according to the similarity between the target protein and the proteins in the training data set. The combination of the predictions of the classifier subset to each query protein target yielded the final predictions. The ensemble of these classifiers formed a sequence-based predictor to identify protein-ligand binding sites.

Conclusions: Experimental results on two Critical Assessment of protein Structure Prediction datasets and the ccPDB dataset demonstrated that of our proposed method compared favorably with the state-of-the-art.

Availability: http://www2.ahu.edu.cn/pchen/web/LigandDSES.htm.

Publication types

  • Evaluation Study

MeSH terms

  • Algorithms
  • Binding Sites
  • Data Interpretation, Statistical
  • Ligands
  • Machine Learning*
  • Models, Chemical
  • Molecular Docking Simulation / methods*
  • Pattern Recognition, Automated / methods*
  • Protein Binding
  • Protein Interaction Mapping / methods*
  • Proteins / chemistry*
  • Sequence Analysis, Protein / methods*

Substances

  • Ligands
  • Proteins