Molecular property prediction by semantic-invariant contrastive learning

Ziqiao Zhang; Ailin Xie; Jihong Guan; Shuigeng Zhou

doi:10.1093/bioinformatics/btad462

Molecular property prediction by semantic-invariant contrastive learning

Bioinformatics. 2023 Aug 1;39(8):btad462. doi: 10.1093/bioinformatics/btad462.

Authors

Ziqiao Zhang¹, Ailin Xie¹, Jihong Guan², Shuigeng Zhou¹

Affiliations

¹ Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai 200438, China.
² Department of Computer Science and Technology, Tongji University, Shanghai 201804, China.

Abstract

Motivation: Contrastive learning has been widely used as pretext tasks for self-supervised pre-trained molecular representation learning models in AI-aided drug design and discovery. However, existing methods that generate molecular views by noise-adding operations for contrastive learning may face the semantic inconsistency problem, which leads to false positive pairs and consequently poor prediction performance.

Results: To address this problem, in this article, we first propose a semantic-invariant view generation method by properly breaking molecular graphs into fragment pairs. Then, we develop a Fragment-based Semantic-Invariant Contrastive Learning (FraSICL) model based on this view generation method for molecular property prediction. The FraSICL model consists of two branches to generate representations of views for contrastive learning, meanwhile a multi-view fusion and an auxiliary similarity loss are introduced to make better use of the information contained in different fragment-pair views. Extensive experiments on various benchmark datasets show that with the least number of pre-training samples, FraSICL can achieve state-of-the-art performance, compared with major existing counterpart models.

Availability and implementation: The code is publicly available at https://github.com/ZiqiaoZhang/FraSICL.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Benchmarking*
Models, Molecular
Semantics*