Improvement of protein binding sites prediction by selecting amino acid residues' features

J Struct Biol. 2015 Jan;189(1):9-19. doi: 10.1016/j.jsb.2014.11.007. Epub 2014 Dec 3.

Abstract

One of the main focuses of bioinformatics community is the study of the relationship between the structure of the protein molecules and their functions. In the literature, there are various methods that consider different protein-derived information for predicting protein functions. In our research, we focus on predicting the protein binding sites, which could be used to functionally annotate the protein structures. In this paper we consider a set of sixteen amino acid residues' features, and by applying various feature selection techniques we estimate their significance. Although the number of features in our case is not high, we perform feature selection in order to improve the prediction power and time complexity of the prediction models. The results show that by applying proper feature selection technique, the predictive performance of the classification algorithms is improved, i.e., by considering the most relevant features we induce more accurate models than if we consider the entire set of features. Furthermore, the model complexity, as well as the training and testing times are decreased by performing feature selection. We also compare our approach with several existing methods for protein binding sites prediction. The results demonstrate that the existing methods considered in this research are specific and applicable to the group of proteins for which the model was developed, while our approach is more generic and can be applied to a wider class of proteins.

Keywords: Feature selection technique (FST); Feature transformation technique; Protein binding site; Protein function; Protein interaction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acids / genetics*
  • Binding Sites / genetics*
  • Computational Biology / methods*
  • Databases, Genetic
  • Models, Genetic*
  • Principal Component Analysis
  • Protein Interaction Mapping / methods*
  • Proteins / genetics*

Substances

  • Amino Acids
  • Proteins