A novel matrix of sequence descriptors for predicting protein-protein interactions from amino acid sequences

PLoS One. 2019 Jun 7;14(6):e0217312. doi: 10.1371/journal.pone.0217312. eCollection 2019.

Abstract

Protein-protein interactions (PPIs) play an important role in the life activities of organisms. With the availability of large amounts of protein sequence data, PPIs prediction methods have attracted increasing attention. A variety of protein sequence coding methods have emerged, but the training of these methods is particularly time consuming. To solve this issue, we have proposed a novel matrix sequence coding method. Based on deep neural network (DNN) and a novel matrix protein sequence descriptor, we constructed a protein interaction prediction model for predicting PPIs. When performed on human PPIs data, the method achieved an accuracy of 94.34%, a recall of 98.28%, an area under the curve (AUC) of 97.79% and a loss of 23.25%. A non-redundant dataset was used to evaluate this prediction model, and the prediction accuracy is 88.29%. These results indicate that the matrix of sequence (MOS) descriptor can enhance the predictive power of PPIs and reduce training time, which can be a useful complement for future proteomics research. The experimental code and experimental results can be found at https://github.com/smalltalkman/hppi-tensorflow.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Databases, Protein*
  • Deep Learning*
  • Humans
  • Models, Genetic
  • Protein Interaction Maps*
  • Sequence Analysis, Protein*

Grants and funding

This work is supported by grants from the National Natural Science Foundation of China (61773360 and 31671586).