Prediction of protein mononucleotide binding sites using AlphaFold2 and machine learning

Comput Biol Chem. 2022 Oct:100:107744. doi: 10.1016/j.compbiolchem.2022.107744. Epub 2022 Jul 23.

Abstract

In this study, we developed a system that predicts the binding sites of proteins for five mononucleotides (AMP, ADP, ATP, GDP, and GTP). The system comprises two machine learning (ML)-based predictors using a convolutional neural network and a gradient boosting machine, two template-based predictors based on sequence and structure alignment, and a predictor that performs ensemble learning of these four predictors. In this study, data augmentation of ligand binding sites with similar ligand structures was performed. For example, in the prediction of ADP-binding sites using ML methods, the binding sites of AMP and ATP, which have similar structures, are considered. In addition, we constructed the structure models using AlphaFold2, a highly accurate protein prediction method. The secondary structure and dihedral angle information obtained using the model structures were used as ML predictor features. Additionally, in the template-based predictor, the structures of the binding sites were used as templates to be explored based on structure alignment to identify the binding site of the target. Consequently, the template-based predictor based on structure alignment showed the best performance among the four individual predictors, and the ensemble predictor achieved the best performance, with an area under the curve of 0.958 for all mononucleotides.

Keywords: AlphaFold2; Binding site; Machine learning; Mononucleotide; Proteins; Structure alignment.

MeSH terms

  • Adenosine Diphosphate / metabolism
  • Adenosine Monophosphate / metabolism
  • Adenosine Triphosphate
  • Binding Sites
  • Ligands
  • Machine Learning*
  • Protein Binding
  • Proteins* / chemistry

Substances

  • Ligands
  • Proteins
  • Adenosine Monophosphate
  • Adenosine Diphosphate
  • Adenosine Triphosphate