Incorporating representation learning and multihead attention to improve biomedical cross-sentence n-ary relation extraction

BMC Bioinformatics. 2020 Jul 16;21(1):312. doi: 10.1186/s12859-020-03629-9.

Abstract

Background: Most biomedical information extraction focuses on binary relations within single sentences. However, extracting n-ary relations that span multiple sentences is in huge demand. At present, in the cross-sentence n-ary relation extraction task, the mainstream method not only relies heavily on syntactic parsing but also ignores prior knowledge.

Results: In this paper, we propose a novel cross-sentence n-ary relation extraction method that utilizes the multihead attention and knowledge representation that is learned from the knowledge graph. Our model is built on self-attention, which can directly capture the relations between two words regardless of their syntactic relation. In addition, our method makes use of entity and relation information from the knowledge base to impose assistance while predicting the relation. Experiments on n-ary relation extraction show that combining context and knowledge representations can significantly improve the n-ary relation extraction performance. Meanwhile, we achieve comparable results with state-of-the-art methods.

Conclusions: We explored a novel method for cross-sentence n-ary relation extraction. Unlike previous approaches, our methods operate directly on the sequence and learn how to model the internal structures of sentences. In addition, we introduce the knowledge representations learned from the knowledge graph into the cross-sentence n-ary relation extraction. Experiments based on knowledge representation learning show that entities and relations can be extracted in the knowledge graph, and coding this knowledge can provide consistent benefits.

Keywords: Biomedical n-ary relation; Multihead attention; Representation learning.

MeSH terms

  • Algorithms*
  • Biomedical Research*
  • Humans
  • Knowledge Bases
  • Machine Learning
  • Models, Theoretical
  • Reproducibility of Results