ADME prediction with KNIME: A retrospective contribution to the second "Solubility Challenge"

ADMET DMPK. 2021 Jul 12;9(3):209-218. doi: 10.5599/admet.979. eCollection 2021.

Abstract

Computational models for predicting aqueous solubility from the molecular structure represent a promising strategy from the perspective of drug design and discovery. Since the first "Solubility Challenge", these initiatives have marked the state-of-art of the modelling algorithms used to predict drug solubility. In this regard, the quality of the input experimental data and its influence on model performance has been frequently discussed. In our previous study, we developed a computational model for aqueous solubility based on recursive random forest approaches. The aim of the current commentary is to analyse the performance of this already trained predictive model on the molecules of the second "Solubility Challenge". Even when our training set has inconsistencies related to the pH, solid form and temperature conditions of the solubility measurements, the model was able to predict the two sets from the second "Solubility Challenge" with statistics comparable to those of the top ranked models. Finally, we provided a KNIME automated workflow to predict aqueous solubility of new drug candidates, during the early stages of drug discovery and development, for ensuring the applicability and reproducibility of our model.

Keywords: ADME; KNIME; Quantitative Structure-Property Relationship (QSPR); Random Forest; Second Solubility Challenge; aqueous solubility; machine learning; supervised recursive variable selection.