The Impact of Data on Structure-Based Binding Affinity Predictions Using Deep Neural Networks

Pierre-Yves Libouban; Samia Aci-Sèche; Jose Carlos Gómez-Tamayo; Gary Tresadern; Pascal Bonnet

doi:10.3390/ijms242216120

The Impact of Data on Structure-Based Binding Affinity Predictions Using Deep Neural Networks

Int J Mol Sci. 2023 Nov 9;24(22):16120. doi: 10.3390/ijms242216120.

Authors

Pierre-Yves Libouban¹, Samia Aci-Sèche¹, Jose Carlos Gómez-Tamayo², Gary Tresadern², Pascal Bonnet¹

Affiliations

¹ Institute of Organic and Analytical Chemistry (ICOA), UMR7311, Université d'Orléans, CNRS, Pôle de Chimie rue de Chartres, 45067 Orléans, CEDEX 2, France.
² Computational Chemistry, Janssen Research & Development, Janssen Pharmaceutica N. V., B-2340 Beerse, Belgium.

Abstract

Artificial intelligence (AI) has gained significant traction in the field of drug discovery, with deep learning (DL) algorithms playing a crucial role in predicting protein-ligand binding affinities. Despite advancements in neural network architectures, system representation, and training techniques, the performance of DL affinity prediction has reached a plateau, prompting the question of whether it is truly solved or if the current performance is overly optimistic and reliant on biased, easily predictable data. Like other DL-related problems, this issue seems to stem from the training and test sets used when building the models. In this work, we investigate the impact of several parameters related to the input data on the performance of neural network affinity prediction models. Notably, we identify the size of the binding pocket as a critical factor influencing the performance of our statistical models; furthermore, it is more important to train a model with as much data as possible than to restrict the training to only high-quality datasets. Finally, we also confirm the bias in the typically used current test sets. Therefore, several types of evaluation and benchmarking are required to understand models' decision-making processes and accurately compare the performance of models.

Keywords: binding affinities; deep learning; protein–ligand.

MeSH terms

Algorithms
Artificial Intelligence*
Ligands
Neural Networks, Computer*
Protein Binding

Substances

Ligands

Grants and funding

262402/Janssen (Belgium)