UGDAS: Unsupervised graph-network based denoiser for abstractive summarization in biomedical domain

Methods. 2022 Jul:203:160-166. doi: 10.1016/j.ymeth.2022.03.012. Epub 2022 Apr 2.

Abstract

Abstractive summarization models can generate summary auto-regressively, but the quality is often impacted by the noise in the text. Learning cross-sentence relations is a crucial step in this task and the graph-based network is more effective to capture the sentence relationship. Moreover, knowledge is very important to distinguish the noise of the text in special domain. A novel model structure called UGDAS is proposed in this paper, which combines a sentence-level denoiser based on an unsupervised graph-network and an auto-regressive generator. It utilizes domain knowledge and sentence position information to denoise the original text and further improve the quality of generated summaries. We use the recently-introduced dataset CORD-19 (COVID-19 Open Research Dataset) on text summarization task, which contains large-scale data on coronaviruses. The experimental results show that our model achieves the SOTA (state-of-the-art) result on CORD-19 dataset and outperforms the related baseline models on the PubMed Abstract dataset.

Keywords: Abstractive summarization; Domain knowledge; Graph-network; Pre-trained language model.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • COVID-19*
  • Concept Formation
  • Humans
  • Semantics*