Extracting biomedical relation from cross-sentence text using syntactic dependency graph attention network

Xueyang Zhou; Qiming Fu; Jianping Chen; Lanhui Liu; Yunzhe Wang; You Lu; Hongjie Wu

doi:10.1016/j.jbi.2023.104445

Extracting biomedical relation from cross-sentence text using syntactic dependency graph attention network

J Biomed Inform. 2023 Aug:144:104445. doi: 10.1016/j.jbi.2023.104445. Epub 2023 Jul 17.

Authors

Xueyang Zhou¹, Qiming Fu², Jianping Chen³, Lanhui Liu⁴, Yunzhe Wang¹, You Lu¹, Hongjie Wu⁵

Affiliations

¹ Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China; Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou 215009, China.
² Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China; Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou 215009, China. Electronic address: fqm_1@126.com.
³ Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou 215009, China; Architecture and Urban Planning, Suzhou University of Science and Technology, Suzhou 215009, China; Chongqing Industrial Big Data Innovation Center Co., Ltd., Chongqing 4007071, China. Electronic address: alanjpchen@aliyun.com.
⁴ Chongqing Industrial Big Data Innovation Center Co., Ltd., Chongqing 4007071, China.
⁵ Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China.

PMID: 37467835
DOI: 10.1016/j.jbi.2023.104445

Abstract

In biomedical literature, cross-sentence texts can usually express rich knowledge, and extracting the interaction relation between entities from cross-sentence texts is of great significance to biomedical research. However, compared with single sentence, cross-sentence text has a longer sequence length, so the research on cross-sentence text information extraction should focus more on learning the context dependency structural information. Nowadays, it is still a challenge to handle global dependencies and structural information of long sequences effectively, and graph-oriented modeling methods have received more and more attention recently. In this paper, we propose a new graph attention network guided by syntactic dependency relationship (SR-GAT) for extracting biomedical relation from the cross-sentence text. It allows each node to pay attention to other nodes in its neighborhood, regardless of the sequence length. The attention weight between nodes is given by a syntactic relation graph probability network (SR-GPR), which encodes the syntactic dependency between nodes and guides the graph attention mechanism to learn information about the dependency structure. The learned feature representation retains information about the node-to-node syntactic dependency, and can further discover global dependencies effectively. The experimental results demonstrate on a publicly available biomedical dataset that, our method achieves state-of-the-art performance while requiring significantly less computational resources. Specifically, in the "drug-mutation" relation extraction task, our method achieves an advanced accuracy of 93.78% for binary classification and 92.14% for multi-classification. In the "drug-gene-mutation" relation extraction task, our method achieves an advanced accuracy of 93.22% for binary classification and 92.28% for multi-classification. Across all relation extraction tasks, our method improves accuracy by an average of 0.49% compared to the existing best model. Furthermore, our method achieved an accuracy of 69.5% in text classification, surpassing most existing models, demonstrating its robustness in generalization across different domains without additional fine-tuning.

Keywords: Biomedical relation extraction; Cross-sentence text; Feature engineering; Graph-oriented; Syntactic dependency graph attention.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Biomedical Research*
Information Storage and Retrieval
Language*