An Efficient Feature Extraction Technique Based on Local Coding PSSM and Multifeatures Fusion for Predicting Protein-Protein Interactions

Evol Bioinform Online. 2019 Oct 3:15:1176934319879920. doi: 10.1177/1176934319879920. eCollection 2019.

Abstract

Background: Increasing evidence has indicated that protein-protein interactions (PPIs) play important roles in various aspects of the structural and functional organization of a cell. Thus, continuing to uncover potential PPIs is an important topic in the biomedical domain. Although various feature extraction methods with machine learning approaches have enhanced the prediction of PPIs. There remains room for improvement by developing novel and effective feature extraction methods and classifier approaches to identify PPIs.

Method: In this study, we proposed a sequence-based feature extraction method called LCPSSMMF, which combined local coding position-specific scoring matrix (PSSM) with multifeatures fusion. First, we used a novel local coding method based on PSSM to build a new PSSM (CPSSM); the advantage of this method is that it incorporated global and local feature extraction, which can account for the interactions between residues in both continuous and discontinuous regions of amino acid sequences. Second, we adopted 2 different feature extraction methods (Local Average Group [LAG] and Bigram Probability [BP]) to capture multiple key feature information by employing the evolutionary information embedded in the CPSSM matrix. Finally, feature vectors were acquired by using multifeatures fusion method.

Result: To evaluate the performance of the proposed feature extraction approach, we employed support vector machine (SVM) as a prediction classifier and applied this method to yeast and human PPI datasets. The prediction accuracies of LCPSSMMF were 93.43% and 90.41% on the yeast and human datasets, respectively. Moreover, we also compared the proposed method with the previous sequence-based approaches on the yeast datasets by using the same SVM classifier. The experimental results indicated that the performance of LCPSSMMF significantly exceeded that of several other state-of-the-art methods. It is proven that the LCPSSMMF approach can capture more local and global discriminatory information than almost all previous methods and can function remarkably well in identifying PPIs. To facilitate extensive research in future proteomics studies, we developed a LCPSSMMFSVM server, which is freely available for academic use at http://219.219.62.123:8888/LCPSSMMFSVM.

Keywords: Protein-protein interactions; local coding; multifeatures fusion; position-specific scoring matrix; support vector machine.