Interpreting Neural Network Models for Toxicity Prediction by Extracting Learned Chemical Features

J Chem Inf Model. 2024 May 13;64(9):3670-3688. doi: 10.1021/acs.jcim.4c00127. Epub 2024 Apr 30.

Abstract

Neural network models have become a popular machine-learning technique for the toxicity prediction of chemicals. However, due to their complex structure, it is difficult to understand predictions made by these models which limits confidence. Current techniques to tackle this problem such as SHAP or integrated gradients provide insights by attributing importance to the input features of individual compounds. While these methods have produced promising results in some cases, they do not shed light on how representations of compounds are transformed in hidden layers, which constitute how neural networks learn. We present a novel technique to interpret neural networks which identifies chemical substructures in training data found to be responsible for the activation of hidden neurons. For individual test compounds, the importance of hidden neurons is determined, and the associated substructures are leveraged to explain the model prediction. Using structural alerts for mutagenicity from the Derek Nexus expert system as ground truth, we demonstrate the validity of the approach and show that model explanations are competitive with and complementary to explanations obtained from an established feature attribution method.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Machine Learning
  • Neural Networks, Computer*