An Improved Metric and Benchmark for Assessing the Performance of Virtual Screening Models

Michael Brocidiacono; Konstantin I Popov; Alexander Tropsha

An Improved Metric and Benchmark for Assessing the Performance of Virtual Screening Models

ArXiv [Preprint]. 2024 Mar 15:arXiv:2403.10478v1.

Authors

Michael Brocidiacono¹, Konstantin I Popov¹, Alexander Tropsha¹

Affiliation

¹ Univerity of North Carolina at Chapel Hill.

PMID: 38560736
PMCID: PMC10980085

Abstract

Structure-based virtual screening (SBVS) is a key workflow in computational drug discovery. SBVS models are assessed by measuring the enrichment of known active molecules over decoys in retrospective screens. However, the standard formula for enrichment cannot estimate model performance on very large libraries. Additionally, current screening benchmarks cannot easily be used with machine learning (ML) models due to data leakage. We propose an improved formula for calculating VS enrichment and introduce the BayesBind benchmarking set composed of protein targets that are structurally dissimilar to those in the BigBind training set. We assess current models on this benchmark and find that none perform appreciably better than a KNN baseline. We publicly release the BayesBind benchmark at https://github.com/molecularmodelinglab/bigbind.

Publication types

Preprint

Grants and funding

R01 GM140154/GM/NIGMS NIH HHS/United States