Findings of the Second Challenge to Predict Aqueous Solubility

Antonio Llinas; Ioana Oprisiu; Alex Avdeef

doi:10.1021/acs.jcim.0c00701

Findings of the Second Challenge to Predict Aqueous Solubility

J Chem Inf Model. 2020 Oct 26;60(10):4791-4803. doi: 10.1021/acs.jcim.0c00701. Epub 2020 Sep 3.

Authors

Antonio Llinas¹, Ioana Oprisiu², Alex Avdeef³

Affiliations

¹ DMPK, Research and Early Development, Respiratory & Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, Gothenburg SE 431 50, Sweden.
² Data Science & Artificial Intelligence, Imaging & Data Analytics, Clinical Pharmacology & Safety Sciences, R&D, AstraZeneca, Gothenburg SE 431 50, Sweden.
³ in-ADME Research, 1732 First Avenue, #102, New York, New York 10128, United States.

PMID: 32794744
DOI: 10.1021/acs.jcim.0c00701

Abstract

Ten years ago, we issued an open prediction challenge to the cheminformatics community: would participants be able to predict the equilibrium intrinsic solubilities of 32 druglike molecules using only a high-precision (CheqSol instrument, performed in one laboratory) set of 100 compounds as a training set? The "solubility challenge" was a widely recognized success and spurred many discussions about the prediction methods and quality of data. We revisited the competition a second time recently and challenged the community to a different challenge, not a blind test this time but using a larger test set of molecules, gathered and curated from published sources (mostly "gold standard" saturation shake-flask measurements), where the average interlaboratory reproducibility for the molecules was estimated to be ∼0.17 log unit. Also, a second test set was included, comprising "contentious" molecules, the reported (mostly shake-flask) solubility of which had higher average uncertainty, ∼0.62 log unit. In the second competition, the participants were invited to use their own training sets, provided that the training sets did not contain any of the test set molecules. We were motivated to revisit the competition to (1) examine to what extent computational methods had improved in 10 years, (2) verify that data quality may not be the main limiting factor in the accuracy of the prediction method, and (3) attempt to seek a relationship between the makeup of the training set data and the prediction outcome.

MeSH terms

Cheminformatics
Humans
Pharmaceutical Preparations*
Reproducibility of Results
Solubility
Water*

Substances

Pharmaceutical Preparations
Water