Prediction of protein-protein interaction sites by means of ensemble learning and weighted feature descriptor

J Biol Res (Thessalon). 2016 Jul 4;23(Suppl 1):10. doi: 10.1186/s40709-016-0046-7. eCollection 2016 May.

Abstract

Background: Reliable prediction of protein-protein interaction sites is an important goal in the field of bioinformatics. Many computational methods have been explored for the large-scale prediction of protein-protein interaction sites based on various data types, including protein sequence, structural and genomic data. Although much progress has been achieved in recent years, the problem has not yet been satisfactorily solved.

Results: In this work, we presented an efficient approach that uses ensemble learning algorithm with weighted feature descriptor (EL-WFD) to predict protein-protein interaction sites. Moreover, weighted feature descriptor was designed to describe the distance influence of neighboring residues on interaction sites. The results on two dataset (Hetero and Homo), show that the proposed method yields a satisfactory accuracy with 83.8 % recall and 96.3 % precision on the Hetero dataset and 84.2 % recall and 96.3 % precision on the Homo dataset, respectively. In both datasets, our method tend to obtain high Mathews correlation coefficient compared with state-of-the-art technique random forest method.

Conclusions: The experimental results show that the EL-WFD method is quite effective in predicting protein-protein interaction sites. The novel weighted feature descriptor was proved to be promising in discovering interaction sites. Overall, the proposed method can be considered as a new powerful tool for predicting protein-protein interaction sites with excellence performance.