Interpreting Neural Network Models for Toxicity Prediction by Extracting Learned Chemical Features

Moritz Walter; Samuel J Webb; Valerie J Gillet

doi:10.1021/acs.jcim.4c00127

Interpreting Neural Network Models for Toxicity Prediction by Extracting Learned Chemical Features

J Chem Inf Model. 2024 May 13;64(9):3670-3688. doi: 10.1021/acs.jcim.4c00127. Epub 2024 Apr 30.

Authors

Moritz Walter¹, Samuel J Webb², Valerie J Gillet¹

Affiliations

¹ Information School, University of Sheffield, The Wave, 2 Whitham Road, Sheffield S10 2AH, U.K.
² Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds LS11 5PY, U.K.

Abstract

Neural network models have become a popular machine-learning technique for the toxicity prediction of chemicals. However, due to their complex structure, it is difficult to understand predictions made by these models which limits confidence. Current techniques to tackle this problem such as SHAP or integrated gradients provide insights by attributing importance to the input features of individual compounds. While these methods have produced promising results in some cases, they do not shed light on how representations of compounds are transformed in hidden layers, which constitute how neural networks learn. We present a novel technique to interpret neural networks which identifies chemical substructures in training data found to be responsible for the activation of hidden neurons. For individual test compounds, the importance of hidden neurons is determined, and the associated substructures are leveraged to explain the model prediction. Using structural alerts for mutagenicity from the Derek Nexus expert system as ground truth, we demonstrate the validity of the approach and show that model explanations are competitive with and complementary to explanations obtained from an established feature attribution method.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Machine Learning
Neural Networks, Computer*