Few-shot biomedical named entity recognition via knowledge-guided instance generation and prompt contrastive learning

Bioinformatics. 2023 Aug 1;39(8):btad496. doi: 10.1093/bioinformatics/btad496.

Abstract

Motivation: Few-shot learning that can effectively perform named entity recognition in low-resource scenarios has raised growing attention, but it has not been widely studied yet in the biomedical field. In contrast to high-resource domains, biomedical named entity recognition (BioNER) often encounters limited human-labeled data in real-world scenarios, leading to poor generalization performance when training only a few labeled instances. Recent approaches either leverage cross-domain high-resource data or fine-tune the pre-trained masked language model using limited labeled samples to generate new synthetic data, which is easily stuck in domain shift problems or yields low-quality synthetic data. Therefore, in this article, we study a more realistic scenario, i.e. few-shot learning for BioNER.

Results: Leveraging the domain knowledge graph, we propose knowledge-guided instance generation for few-shot BioNER, which generates diverse and novel entities based on similar semantic relations of neighbor nodes. In addition, by introducing question prompt, we cast BioNER as question-answering task and propose prompt contrastive learning to improve the robustness of the model by measuring the mutual information between query-answer pairs. Extensive experiments conducted on various few-shot settings show that the proposed framework achieves superior performance. Particularly, in a low-resource scenario with only 20 samples, our approach substantially outperforms recent state-of-the-art models on four benchmark datasets, achieving an average improvement of up to 7.1% F1.

Availability and implementation: Our source code and data are available at https://github.com/cpmss521/KGPC.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Benchmarking
  • Deep Learning*
  • Humans
  • Semantics
  • Software