An improved multi-modal representation-learning model based on fusion networks for property prediction in drug discovery

Comput Biol Med. 2023 Oct:165:107452. doi: 10.1016/j.compbiomed.2023.107452. Epub 2023 Sep 9.

Abstract

Accurate characterization of molecular representations plays an important role in the property prediction based on deep learning (DL) for drug discovery. However, most previous researches considered only one type of molecular representations, resulting in that it difficult to capture the full molecular feature information. In this study, a novel DL framework called multi-modal molecular representation learning fusion network (MMRLFN) is developed, which could simultaneously learn and integrate drug molecular features from molecular graphs and SMILES sequences. The developed MMRLFN method is composed of three complementary deep neural networks to learn various features from different molecular representations, such as molecular topology, local chemical background information, and substructures at varying scales. Eight public datasets involving various molecular properties used in drug discovery were employed to train and evaluate the developed MMRLFN. The obtained models showed better performances than the existing models based on mono-modal molecular representations. Additionally, a thorough analysis of the noise resistance and interpretability of the MMRLFN has been carried out. The generalization ability and effectiveness of the MMRLFN has been verified by case studies as well. Overall, the MMRLFN can accurately predict molecular properties and provide potentially valuable information from large datasets, thereby maximizing the possibility of successful drug discovery.

Keywords: Deep learning; Drug discovery; Feature fusion; Multi-modal molecular representations; Property prediction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Drug Discovery*
  • Neural Networks, Computer*