Data-balanced transformer for accelerated ionizable lipid nanoparticles screening in mRNA delivery

Brief Bioinform. 2024 Mar 27;25(3):bbae186. doi: 10.1093/bib/bbae186.

Abstract

Despite the widespread use of ionizable lipid nanoparticles (LNPs) in clinical applications for messenger RNA (mRNA) delivery, the mRNA drug delivery system faces an efficient challenge in the screening of LNPs. Traditional screening methods often require a substantial amount of experimental time and incur high research and development costs. To accelerate the early development stage of LNPs, we propose TransLNP, a transformer-based transfection prediction model designed to aid in the selection of LNPs for mRNA drug delivery systems. TransLNP uses two types of molecular information to perceive the relationship between structure and transfection efficiency: coarse-grained atomic sequence information and fine-grained atomic spatial relationship information. Due to the scarcity of existing LNPs experimental data, we find that pretraining the molecular model is crucial for better understanding the task of predicting LNPs properties, which is achieved through reconstructing atomic 3D coordinates and masking atom predictions. In addition, the issue of data imbalance is particularly prominent in the real-world exploration of LNPs. We introduce the BalMol block to solve this problem by smoothing the distribution of labels and molecular features. Our approach outperforms state-of-the-art works in transfection property prediction under both random and scaffold data splitting. Additionally, we establish a relationship between molecular structural similarity and transfection differences, selecting 4267 pairs of molecular transfection cliffs, which are pairs of molecules that exhibit high structural similarity but significant differences in transfection efficiency, thereby revealing the primary source of prediction errors. The code, model and data are made publicly available at https://github.com/wklix/TransLNP.

Keywords: data imbalance; ionizable lipid nanoparticles; transfection cliffs.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Drug Delivery Systems
  • Humans
  • Lipids* / chemistry
  • Liposomes*
  • Models, Molecular
  • Nanoparticles* / chemistry
  • RNA, Messenger* / chemistry
  • RNA, Messenger* / genetics
  • Transfection

Substances

  • RNA, Messenger
  • Lipids
  • Lipid Nanoparticles
  • Liposomes