Chemical structure-aware molecular image representation learning

Brief Bioinform. 2023 Sep 22;24(6):bbad404. doi: 10.1093/bib/bbad404.

Abstract

Current methods of molecular image-based drug discovery face two major challenges: (1) work effectively in absence of labels, and (2) capture chemical structure from implicitly encoded images. Given that chemical structures are explicitly encoded by molecular graphs (such as nitrogen, benzene rings and double bonds), we leverage self-supervised contrastive learning to transfer chemical knowledge from graphs to images. Specifically, we propose a novel Contrastive Graph-Image Pre-training (CGIP) framework for molecular representation learning, which learns explicit information in graphs and implicit information in images from large-scale unlabeled molecules via carefully designed intra- and inter-modal contrastive learning. We evaluate the performance of CGIP on multiple experimental settings (molecular property prediction, cross-modal retrieval and distribution similarity), and the results show that CGIP can achieve state-of-the-art performance on all 12 benchmark datasets and demonstrate that CGIP transfers chemical knowledge in graphs to molecular images, enabling image encoder to perceive chemical structures in images. We hope this simple and effective framework will inspire people to think about the value of image for molecular representation learning.

Keywords: contrastive learning; convolutional neutral network; drug discovery; graph neural networks; unsupervised learning.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Benchmarking*
  • Drug Discovery
  • Humans
  • Learning*