An improved multi-modal representation-learning model based on fusion networks for property prediction in drug discovery

Jinzhou Wu; Yang Su; Ao Yang; Jingzheng Ren; Yi Xiang

doi:10.1016/j.compbiomed.2023.107452

An improved multi-modal representation-learning model based on fusion networks for property prediction in drug discovery

Comput Biol Med. 2023 Oct:165:107452. doi: 10.1016/j.compbiomed.2023.107452. Epub 2023 Sep 9.

Authors

Jinzhou Wu¹, Yang Su², Ao Yang³, Jingzheng Ren⁴, Yi Xiang¹

Affiliations

¹ School of Intelligent Technology and Engineering, Chongqing University of Science and Technology, Chongqing, 401331, China.
² School of Intelligent Technology and Engineering, Chongqing University of Science and Technology, Chongqing, 401331, China. Electronic address: 2020032@cqust.edu.cn.
³ School of Safety Engineering (School of Emergency Management), Chongqing University of Science and Technology, Chongqing, 401331, China.
⁴ Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, China.

PMID: 37690287
DOI: 10.1016/j.compbiomed.2023.107452

Abstract

Accurate characterization of molecular representations plays an important role in the property prediction based on deep learning (DL) for drug discovery. However, most previous researches considered only one type of molecular representations, resulting in that it difficult to capture the full molecular feature information. In this study, a novel DL framework called multi-modal molecular representation learning fusion network (MMRLFN) is developed, which could simultaneously learn and integrate drug molecular features from molecular graphs and SMILES sequences. The developed MMRLFN method is composed of three complementary deep neural networks to learn various features from different molecular representations, such as molecular topology, local chemical background information, and substructures at varying scales. Eight public datasets involving various molecular properties used in drug discovery were employed to train and evaluate the developed MMRLFN. The obtained models showed better performances than the existing models based on mono-modal molecular representations. Additionally, a thorough analysis of the noise resistance and interpretability of the MMRLFN has been carried out. The generalization ability and effectiveness of the MMRLFN has been verified by case studies as well. Overall, the MMRLFN can accurately predict molecular properties and provide potentially valuable information from large datasets, thereby maximizing the possibility of successful drug discovery.

Keywords: Deep learning; Drug discovery; Feature fusion; Multi-modal molecular representations; Property prediction.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Drug Discovery*
Neural Networks, Computer*