Mol2Context-vec: learning molecular representation from context awareness for drug discovery

Brief Bioinform. 2021 Nov 5;22(6):bbab317. doi: 10.1093/bib/bbab317.

Abstract

With the rapid development of proteomics and the rapid increase of target molecules for drug action, computer-aided drug design (CADD) has become a basic task in drug discovery. One of the key challenges in CADD is molecular representation. High-quality molecular expression with chemical intuition helps to promote many boundary problems of drug discovery. At present, molecular representation still faces several urgent problems, such as the polysemy of substructures and unsmooth information flow between atomic groups. In this research, we propose a deep contextualized Bi-LSTM architecture, Mol2Context-vec, which can integrate different levels of internal states to bring dynamic representations of molecular substructures. And the obtained molecular context representation can capture the interactions between any atomic groups, especially a pair of atomic groups that are topologically distant. Experiments show that Mol2Context-vec achieves state-of-the-art performance on multiple benchmark datasets. In addition, the visual interpretation of Mol2Context-vec is very close to the structural properties of chemical molecules as understood by humans. These advantages indicate that Mol2Context-vec can be used as a reliable and effective tool for molecular expression. Availability: The source code is available for download in https://github.com/lol88/Mol2Context-vec.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Cheminformatics / methods*
  • Deep Learning*
  • Drug Design / methods*
  • Drug Discovery / methods*
  • Humans
  • Models, Molecular
  • Quantum Theory
  • Structure-Activity Relationship