A Framework for Improving the Generalizability of Drug-Target Affinity Prediction Models

Riza ÖZçelİk; Alperen Bağ; Berk Atil; Melİh Barsbey; Arzucan ÖZgür; Elif Ozkirimli

doi:10.1089/cmb.2023.0208

A Framework for Improving the Generalizability of Drug-Target Affinity Prediction Models

J Comput Biol. 2023 Nov;30(11):1226-1239. doi: 10.1089/cmb.2023.0208.

Authors

Riza ÖZçelİk¹, Alperen Bağ², Berk Atil¹, Melİh Barsbey¹, Arzucan ÖZgür¹, Elif Ozkirimli³

Affiliations

¹ Department of Computer Engineering, Boğaziçi University, İstanbul, Turkey.
² Technical University of Munich, Munich, Germany.
³ Roche Informatics, F. Hoffmann-La Roche AG, Basel, Switzerland.

PMID: 37988395
DOI: 10.1089/cmb.2023.0208

Abstract

Statistical models that accurately predict the binding affinity of an input ligand-protein pair can greatly accelerate drug discovery. Such models are trained on available ligand-protein interaction data sets, which may contain biases that lead the predictor models to learn data set-specific, spurious patterns instead of generalizable relationships. This leads the prediction performances of these models to drop dramatically for previously unseen biomolecules. Various approaches that aim to improve model generalizability either have limited applicability or introduce the risk of degrading overall prediction performance. In this article, we present DebiasedDTA, a novel training framework for drug-target affinity (DTA) prediction models that addresses data set biases to improve the generalizability of such models. DebiasedDTA relies on reweighting the training samples to achieve robust generalization, and is thus applicable to most DTA prediction models. Extensive experiments with different biomolecule representations, model architectures, and data sets demonstrate that DebiasedDTA achieves improved generalizability in predicting drug-target affinities.

Keywords: computational drug discovery; drug–target affinity; importance weighting; out-of-distribution generalization; spurious correlation; virtual drug screening.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Drug Discovery
Ligands
Models, Statistical*
Proteins* / chemistry

Substances

Ligands
Proteins