Interpretation of Compound Activity Predictions from Complex Machine Learning Models Using Local Approximations and Shapley Values

Raquel Rodríguez-Pérez; Jürgen Bajorath

doi:10.1021/acs.jmedchem.9b01101

Interpretation of Compound Activity Predictions from Complex Machine Learning Models Using Local Approximations and Shapley Values

J Med Chem. 2020 Aug 27;63(16):8761-8777. doi: 10.1021/acs.jmedchem.9b01101. Epub 2019 Sep 26.

Authors

Raquel Rodríguez-Pérez^{1

2}, Jürgen Bajorath¹

Affiliations

¹ Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, D-53115 Bonn, Germany.
² Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397 Biberach an der Riß, Germany.

PMID: 31512867
DOI: 10.1021/acs.jmedchem.9b01101

Abstract

In qualitative or quantitative studies of structure-activity relationships (SARs), machine learning (ML) models are trained to recognize structural patterns that differentiate between active and inactive compounds. Understanding model decisions is challenging but of critical importance to guide compound design. Moreover, the interpretation of ML results provides an additional level of model validation based on expert knowledge. A number of complex ML approaches, especially deep learning (DL) architectures, have distinctive black-box character. Herein, a locally interpretable explanatory method termed Shapley additive explanations (SHAP) is introduced for rationalizing activity predictions of any ML algorithm, regardless of its complexity. Models resulting from random forest (RF), nonlinear support vector machine (SVM), and deep neural network (DNN) learning are interpreted, and structural patterns determining the predicted probability of activity are identified and mapped onto test compounds. The results indicate that SHAP has high potential for rationalizing predictions of complex ML models.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Deep Learning / statistics & numerical data*
Organic Chemicals / chemistry*
Support Vector Machine / statistics & numerical data*

Substances

Organic Chemicals