The Impact of Data on Structure-Based Binding Affinity Predictions Using Deep Neural Networks

Int J Mol Sci. 2023 Nov 9;24(22):16120. doi: 10.3390/ijms242216120.

Abstract

Artificial intelligence (AI) has gained significant traction in the field of drug discovery, with deep learning (DL) algorithms playing a crucial role in predicting protein-ligand binding affinities. Despite advancements in neural network architectures, system representation, and training techniques, the performance of DL affinity prediction has reached a plateau, prompting the question of whether it is truly solved or if the current performance is overly optimistic and reliant on biased, easily predictable data. Like other DL-related problems, this issue seems to stem from the training and test sets used when building the models. In this work, we investigate the impact of several parameters related to the input data on the performance of neural network affinity prediction models. Notably, we identify the size of the binding pocket as a critical factor influencing the performance of our statistical models; furthermore, it is more important to train a model with as much data as possible than to restrict the training to only high-quality datasets. Finally, we also confirm the bias in the typically used current test sets. Therefore, several types of evaluation and benchmarking are required to understand models' decision-making processes and accurately compare the performance of models.

Keywords: binding affinities; deep learning; protein–ligand.

MeSH terms

  • Algorithms
  • Artificial Intelligence*
  • Ligands
  • Neural Networks, Computer*
  • Protein Binding

Substances

  • Ligands

Grants and funding