Relative molecule self-attention transformer

Łukasz Maziarka; Dawid Majchrowski; Tomasz Danel; Piotr Gaiński; Jacek Tabor; Igor Podolak; Paweł Morkisz; Stanisław Jastrzębski

doi:10.1186/s13321-023-00789-7

Relative molecule self-attention transformer

J Cheminform. 2024 Jan 3;16(1):3. doi: 10.1186/s13321-023-00789-7.

Authors

Łukasz Maziarka^#¹, Dawid Majchrowski², Tomasz Danel^#³, Piotr Gaiński^{3

4}, Jacek Tabor³, Igor Podolak³, Paweł Morkisz², Stanisław Jastrzębski⁵

Affiliations

¹ Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348, Cracow, Poland. lukasz.maziarka@ii.uj.edu.pl.
² NVIDIA, 2788 San Tomas Expy, Santa Clara, CA, 95051, USA.
³ Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348, Cracow, Poland.
⁴ Ardigen, Podole 76, 30-394, Cracow, Poland.
⁵ Molecule.one, Al. Jerozolimskie 96, 00-807, Warsaw, Poland.

^# Contributed equally.

Abstract

The prediction of molecular properties is a crucial aspect in drug discovery that can save a lot of money and time during the drug design process. The use of machine learning methods to predict molecular properties has become increasingly popular in recent years. Despite advancements in the field, several challenges remain that need to be addressed, like finding an optimal pre-training procedure to improve performance on small datasets, which are common in drug discovery. In our paper, we tackle these problems by introducing Relative Molecule Self-Attention Transformer for molecular representation learning. It is a novel architecture that uses relative self-attention and 3D molecular representation to capture the interactions between atoms and bonds that enrich the backbone model with domain-specific inductive biases. Furthermore, our two-step pretraining procedure allows us to tune only a few hyperparameter values to achieve good performance comparable with state-of-the-art models on a wide selection of downstream tasks.

Keywords: Molecular property prediction; Molecular self-attention; Neural networks pre-training.

Abstract

Grants and funding