Accurate prediction of protein-ATP binding residues using position-specific frequency matrix

Anal Biochem. 2021 Aug 1:626:114241. doi: 10.1016/j.ab.2021.114241. Epub 2021 May 7.

Abstract

Knowledge of protein-ATP interaction can help for protein functional annotation and drug discovery. Accurately identifying protein-ATP binding residues is an important but challenging task to gain the knowledge of protein-ATP interactions, especially for the case where only protein sequence information is given. In this study, we propose a novel method, named DeepATPseq, to predict protein-ATP binding residues without using any information about protein three-dimension structure or sequence-derived structural information. In DeepATPseq, the HHBlits-generated position-specific frequency matrix (PSFM) profile is first employed to extract the feature information of each residue. Then, for each residue, the PSFM-based feature is fed into two prediction models, which are generated by the algorithms of deep convolutional neural network (DCNN) and support vector machine (SVM) separately. The final ATP-binding probability of the corresponding residue is calculated by the weighted sum of the outputted values of DCNN-based and SVM-based models. Experimental results on the independent validation data set demonstrate that DeepATPseq could achieve an accuracy of 77.71%, covering 57.42% of all ATP-binding residues, while achieving a Matthew's correlation coefficient value (0.655) that is significantly higher than that of existing sequence-based methods and comparable to that of the state-of-the-art structure-based predictors. Detailed data analysis show that the major advantage of DeepATPseq lies at the combination utilization of DCNN and SVM that helps dig out more discriminative information from the PSFM profiles. The online server and standalone package of DeepATPseq are freely available at: https://jun-csbio.github.io/DeepATPseq/for academic use.

Keywords: Deep convolutional neural network; Protein sequence information; Protein-ATP Binding residues; Supporting vector machine.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adenosine Triphosphate / metabolism*
  • Algorithms*
  • Computational Biology / methods*
  • Humans
  • Neural Networks, Computer*
  • Protein Binding
  • Proteins / chemistry
  • Proteins / metabolism*

Substances

  • Proteins
  • Adenosine Triphosphate