Toward Unified AI Drug Discovery with Multimodal Knowledge

Yizhen Luo; Xing Yi Liu; Kai Yang; Kui Huang; Massimo Hong; Jiahuan Zhang; Yushuai Wu; Zaiqing Nie

doi:10.34133/hds.0113

Toward Unified AI Drug Discovery with Multimodal Knowledge

Health Data Sci. 2024 Feb 23:4:0113. doi: 10.34133/hds.0113. eCollection 2024.

Authors

Yizhen Luo^{1

2}, Xing Yi Liu¹, Kai Yang¹, Kui Huang^{1

3}, Massimo Hong^{1

2}, Jiahuan Zhang¹, Yushuai Wu¹, Zaiqing Nie^{1

4}

Affiliations

¹ Institute for AI Industry Research (AIR), Tsinghua University, Beijing, China.
² Department of Computer Science and Technology, Tsinghua University, Beijing, China.
³ School of Software and Microelectronics, Peking University, Beijing, China.
⁴ Beijing Academy of Artificial Intelligence (BAAI), Beijing, China.

Abstract

Background: In real-world drug discovery, human experts typically grasp molecular knowledge of drugs and proteins from multimodal sources including molecular structures, structured knowledge from knowledge bases, and unstructured knowledge from biomedical literature. Existing multimodal approaches in AI drug discovery integrate either structured or unstructured knowledge independently, which compromises the holistic understanding of biomolecules. Besides, they fail to address the missing modality problem, where multimodal information is missing for novel drugs and proteins. Methods: In this work, we present KEDD, a unified, end-to-end deep learning framework that jointly incorporates both structured and unstructured knowledge for vast AI drug discovery tasks. The framework first incorporates independent representation learning models to extract the underlying characteristics from each modality. Then, it applies a feature fusion technique to calculate the prediction results. To mitigate the missing modality problem, we leverage sparse attention and a modality masking technique to reconstruct the missing features based on top relevant molecules. Results: Benefiting from structured and unstructured knowledge, our framework achieves a deeper understanding of biomolecules. KEDD outperforms state-of-the-art models by an average of 5.2% on drug-target interaction prediction, 2.6% on drug property prediction, 1.2% on drug-drug interaction prediction, and 4.1% on protein-protein interaction prediction. Through qualitative analysis, we reveal KEDD's promising potential in assisting real-world applications. Conclusions: By incorporating biomolecular expertise from multimodal knowledge, KEDD bears promise in accelerating drug discovery.