Combining Group-Contribution Concept and Graph Neural Networks Toward Interpretable Molecular Property Models

Adem R N Aouichaoui; Fan Fan; Seyed Soheil Mansouri; Jens Abildskov; Gürkan Sin

doi:10.1021/acs.jcim.2c01091

Combining Group-Contribution Concept and Graph Neural Networks Toward Interpretable Molecular Property Models

J Chem Inf Model. 2023 Feb 13;63(3):725-744. doi: 10.1021/acs.jcim.2c01091. Epub 2023 Jan 30.

Authors

Adem R N Aouichaoui¹, Fan Fan¹, Seyed Soheil Mansouri¹, Jens Abildskov¹, Gürkan Sin¹

Affiliation

¹ Process and Systems Engineering Center (PROSYS), Department of Chemical and Biochemical Engineering, Technical University of Denmark, Kgs. LyngbyDK-2800, Denmark.

PMID: 36716461
DOI: 10.1021/acs.jcim.2c01091

Abstract

Quantitative structure-property relationships (QSPRs) are important tools to facilitate and accelerate the discovery of compounds with desired properties. While many QSPRs have been developed, they are associated with various shortcomings such as a lack of generalizability and modest accuracy. Albeit various machine-learning and deep-learning techniques have been integrated into such models, another shortcoming has emerged in the form of a lack of transparency and interpretability of such models. In this work, two interpretable graph neural network (GNN) models (attentive group-contribution (AGC) and group-contribution-based graph attention (GroupGAT)) are developed by integrating fundamentals using the concept of group contributions (GC). The interpretability consists of highlighting the substructure with the highest attention weights in the latent representation of the molecules using the attention mechanism. The proposed models showcased better performance compared to classical group-contribution models, as well as against various other GNN models describing the aqueous solubility, melting point, and enthalpies of formation, combustion, and fusion of organic compounds. The insights provided are consistent with insights obtained from the semiempirical GC models confirming that the proposed framework allows highlighting the important substructures of the molecules for a specific property.

MeSH terms

Machine Learning*
Models, Molecular
Neural Networks, Computer*
Quantitative Structure-Activity Relationship