EdgeSHAPer: Bond-centric Shapley value-based explanation method for graph neural networks

Andrea Mastropietro; Giuseppe Pasculli; Christian Feldmann; Raquel Rodríguez-Pérez; Jürgen Bajorath

doi:10.1016/j.isci.2022.105043

EdgeSHAPer: Bond-centric Shapley value-based explanation method for graph neural networks

iScience. 2022 Aug 30;25(10):105043. doi: 10.1016/j.isci.2022.105043. eCollection 2022 Oct 21.

Authors

Andrea Mastropietro¹, Giuseppe Pasculli¹, Christian Feldmann², Raquel Rodríguez-Pérez^{2

3}, Jürgen Bajorath²

Affiliations

¹ Department of Computer, Control, and Management Engineering Antonio Ruberti (DIAG), Sapienza University, Rome, Italy.
² Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, 53115 Bonn, Germany.
³ Novartis Institutes for Biomedical Research, Novartis Campus, 4002 Basel, Switzerland.

Abstract

Graph neural networks (GNNs) recursively propagate signals along the edges of an input graph, integrate node feature information with graph structure, and learn object representations. Like other deep neural network models, GNNs have notorious black box character. For GNNs, only few approaches are available to rationalize model decisions. We introduce EdgeSHAPer, a generally applicable method for explaining GNN-based models. The approach is devised to assess edge importance for predictions. Therefore, EdgeSHAPer makes use of the Shapley value concept from game theory. For proof-of-concept, EdgeSHAPer is applied to compound activity prediction, a central task in drug discovery. EdgeSHAPer's edge centricity is relevant for molecular graphs where edges represent chemical bonds. Combined with feature mapping, EdgeSHAPer produces intuitive explanations for compound activity predictions. Compared to a popular node-centric and another edge-centric GNN explanation method, EdgeSHAPer reveals higher resolution in differentiating features determining predictions and identifies minimal pertinent positive feature sets.

Keywords: Artificial intelligence; Bioinformatics; Drugs.