Implications of Additivity and Nonadditivity for Machine Learning and Deep Learning Models in Drug Design

Karolina Kwapien; Eva Nittinger; Jiazhen He; Christian Margreitter; Alexey Voronov; Christian Tyrchan

doi:10.1021/acsomega.2c02738

Implications of Additivity and Nonadditivity for Machine Learning and Deep Learning Models in Drug Design

ACS Omega. 2022 Jul 19;7(30):26573-26581. doi: 10.1021/acsomega.2c02738. eCollection 2022 Aug 2.

Authors

Karolina Kwapien¹, Eva Nittinger¹, Jiazhen He², Christian Margreitter², Alexey Voronov², Christian Tyrchan¹

Affiliations

¹ Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, Gothenburg 431 83, Sweden.
² Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg 431 83, Sweden.

Abstract

Matched molecular pairs (MMPs) are nowadays a commonly applied concept in drug design. They are used in many computational tools for structure-activity relationship analysis, biological activity prediction, or optimization of physicochemical properties. However, until now it has not been shown in a rigorous way that MMPs, that is, changing only one substituent between two molecules, can be predicted with higher accuracy and precision in contrast to any other chemical compound pair. It is expected that any model should be able to predict such a defined change with high accuracy and reasonable precision. In this study, we examine the predictability of four classical properties relevant for drug design ranging from simple physicochemical parameters (log D and solubility) to more complex cell-based ones (permeability and clearance), using different data sets and machine learning algorithms. Our study confirms that additive data are the easiest to predict, which highlights the importance of recognition of nonadditivity events and the challenging complexity of predicting properties in case of scaffold hopping. Despite deep learning being well suited to model nonlinear events, these methods do not seem to be an exception of this observation. Though they are in general performing better than classical machine learning methods, this leaves the field with a still standing challenge.