Interpretation of Ligand-Based Activity Cliff Prediction Models Using the Matched Molecular Pair Kernel

Shunsuke Tamura; Swarit Jasial; Tomoyuki Miyao; Kimito Funatsu

doi:10.3390/molecules26164916

Interpretation of Ligand-Based Activity Cliff Prediction Models Using the Matched Molecular Pair Kernel

Molecules. 2021 Aug 13;26(16):4916. doi: 10.3390/molecules26164916.

Authors

Shunsuke Tamura¹, Swarit Jasial^{1

2}, Tomoyuki Miyao^{1

2}, Kimito Funatsu²

Affiliations

¹ Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma 630-0192, Japan.
² Data Science Center, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma 630-0192, Japan.

Abstract

Activity cliffs (ACs) are formed by two structurally similar compounds with a large difference in potency. Accurate AC prediction is expected to help researchers' decisions in the early stages of drug discovery. Previously, predictive models based on matched molecular pair (MMP) cliffs have been proposed. However, the proposed methods face a challenge of interpretability due to the black-box character of the predictive models. In this study, we developed interpretable MMP fingerprints and modified a model-specific interpretation approach for models based on a support vector machine (SVM) and MMP kernel. We compared important features highlighted by this SVM-based interpretation approach and the SHapley Additive exPlanations (SHAP) as a major model-independent approach. The model-specific approach could capture the difference between AC and non-AC, while SHAP assigned high weights to the features not present in the test instances. For specific MMPs, the feature weights mapped by the SVM-based interpretation method were in agreement with the previously confirmed binding knowledge from X-ray co-crystal structures, indicating that this method is able to interpret the AC prediction model in a chemically intuitive manner.

Keywords: SHapley Additive exPlanations; activity-cliff; chemoinformatics; matched molecular pair; model interpretation; support vector machine.