A novel feature extraction scheme for prediction of protein-protein interaction sites

Mol Biosyst. 2015 Feb;11(2):475-85. doi: 10.1039/c4mb00625a. Epub 2014 Nov 21.

Abstract

Identifying protein-protein interaction (PPI) sites plays an important and challenging role in some topics of biology. Although many methods have been proposed, this problem is still far away to be solved. Here, a feature selection approach with an 11-sliding window and random forest algorithm is proposed, which is called DX-RF. This method has achieved an accuracy of 88.79%, recall of 82.09%, and precision of 85.76% with top-ranked 34 features on the Hetero test dataset and has 91.6% accuracy, 89.2% precision, 83.54% recall with top-ranked 25 features set on the Homo test dataset. Compared to other methods, the results indicate that the DX-RF method has a strong ability to select relevance features to get a higher performance. Moreover, in order to further understand protein interactions, feature analysis in this study is also performed.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Databases, Protein
  • Models, Molecular
  • Protein Interaction Mapping / methods*
  • ROC Curve