Technique of Augmenting Molecular Graph Data by Perturbating Hidden Features

Mol Inform. 2022 Jul;41(7):e2100267. doi: 10.1002/minf.202100267. Epub 2022 Jan 27.

Abstract

Quantitative structure-property relationship models are useful in efficiently searching for molecules with desired properties in drug discovery and materials development. In recent years, many such models based on graph neural networks, showing good prediction performance, have been reported. Training graph neural networks generally require many samples, but by using a training method for a small dataset, it is possible to extract features that enable successful prediction. Herein, we design a method of augmenting graph data. In this method, random perturbations are added with a certain probability to some vertex features during message passing. We verify the proposed method's effectiveness in regression and classification tasks. It is confirmed that the proposed method is effective when the perturbation is added immediately before the readout of the graph neural network, and the effect of the data augmentation is most evident for small datasets of approximately 1000 samples.

Keywords: Chemoinformatics; Data augmentation; Graph neural network; Structure-property relationships; Virtual screening.

MeSH terms

  • Drug Discovery
  • Neural Networks, Computer*
  • Quantitative Structure-Activity Relationship*