Enhancing Biomedical ReQA With Adversarial Hard In-Batch Negative Samples

Bo Zhao; Jun Bai; Chen Li; Jianfei Zhang; Wenge Rong; Yuanxin Ouyang; Zhang Xiong

doi:10.1109/TCBB.2023.3261315

Enhancing Biomedical ReQA With Adversarial Hard In-Batch Negative Samples

IEEE/ACM Trans Comput Biol Bioinform. 2023 Sep-Oct;20(5):2933-2944. doi: 10.1109/TCBB.2023.3261315. Epub 2023 Oct 9.

Authors

Bo Zhao, Jun Bai, Chen Li, Jianfei Zhang, Wenge Rong, Yuanxin Ouyang, Zhang Xiong

PMID: 37030792
DOI: 10.1109/TCBB.2023.3261315

Abstract

Question answering (QA) plays a vital role in biomedical natural language processing. Among question answering tasks, the retrieval question answering (ReQA) aims to directly retrieve the correct answer from candidates and has attracted much attention in the community for its efficiency. Recently, researchers have introduced ReQA into the biomedical domain as BioReQA. Typically BioReQA models rely on the dual-encoder to gain semantic representation and are trained following the settings of dense retrieval. However, they normally utilize easy in-batch negative samples in training process to avoid the extra forwarding cost and GPU memory required by encoding additional negative samples. However, hard negative samples have been proved more important with regard to the overall performance of BioReQA tasks. Therefore in this research, we focus on effectively constructing hard in-batch negative samples. Inspired by the classic linear assignment problem, we propose an Iterative Linear Assignment Grouping (ILAG) algorithm to construct hard in-batch negative samples. To further enhance performance for given hard batches in a low-resource scenario, we also employ adversarial training to augment the difficulty of batches. Extensive experiments have shown our proposed method's promising potential in the area of biomedical retrieval question answering.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Information Storage and Retrieval*
Natural Language Processing
Semantics