Garbage in, garbage out: how reliable training data improved a virtual screening approach against SARS-CoV-2 MPro

Front Pharmacol. 2023 Jun 22:14:1193282. doi: 10.3389/fphar.2023.1193282. eCollection 2023.

Abstract

Introduction: The identification of chemical compounds that interfere with SARS-CoV-2 replication continues to be a priority in several academic and pharmaceutical laboratories. Computational tools and approaches have the power to integrate, process and analyze multiple data in a short time. However, these initiatives may yield unrealistic results if the applied models are not inferred from reliable data and the resulting predictions are not confirmed by experimental evidence. Methods: We undertook a drug discovery campaign against the essential major protease (MPro) from SARS-CoV-2, which relied on an in silico search strategy -performed in a large and diverse chemolibrary- complemented by experimental validation. The computational method comprises a recently reported ligand-based approach developed upon refinement/learning cycles, and structure-based approximations. Search models were applied to both retrospective (in silico) and prospective (experimentally confirmed) screening. Results: The first generation of ligand-based models were fed by data, which to a great extent, had not been published in peer-reviewed articles. The first screening campaign performed with 188 compounds (46 in silico hits and 100 analogues, and 40 unrelated compounds: flavonols and pyrazoles) yielded three hits against MPro (IC50 ≤ 25 μM): two analogues of in silico hits (one glycoside and one benzo-thiazol) and one flavonol. A second generation of ligand-based models was developed based on this negative information and newly published peer-reviewed data for MPro inhibitors. This led to 43 new hit candidates belonging to different chemical families. From 45 compounds (28 in silico hits and 17 related analogues) tested in the second screening campaign, eight inhibited MPro with IC50 = 0.12-20 μM and five of them also impaired the proliferation of SARS-CoV-2 in Vero cells (EC50 7-45 μM). Discussion: Our study provides an example of a virtuous loop between computational and experimental approaches applied to target-focused drug discovery against a major and global pathogen, reaffirming the well-known "garbage in, garbage out" machine learning principle.

Keywords: COVID-19; artificial intelligence; coronavirus; drug discovery; in silico screening; protease; rubbish in rubbish out; target-based.

Grants and funding

The financial support of the Urgence COVID-19 Fundraising Campaign of Institut Pasteur, the International Centre for Genetic Engineering and Biotechnology (CRP/URY20-03) and of FOCEM (Fondo para la Convergencia Estructural del Mercosur), grant number COF 03/11) is gratefully acknowledged. Additional support was provided by the National Research Foundation of Korea (NRF) grant funded by the Korea Government (MSIT, No. NRF-2017M3A9G6068254) and a grant funded by the German Research Foundation (KU 1371/9-1). SMR acknowledges the support of the Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET, Argentina) for postdoctoral fellowship. AA, VA, and CC acknowledge the support of the Programa de Alimentos y Salud Humana (PAyS) IDB-R.O.U. (4950/OC-UR). AH-C acknowledges the support of CONACyT (Proyecto No. 251726).