Advancing entity recognition in biomedicine via instruction tuning of large language models

Vipina K Keloth; Yan Hu; Qianqian Xie; Xueqing Peng; Yan Wang; Andrew Zheng; Melih Selek; Kalpana Raja; Chih Hsuan Wei; Qiao Jin; Zhiyong Lu; Qingyu Chen; Hua Xu

doi:10.1093/bioinformatics/btae163

Advancing entity recognition in biomedicine via instruction tuning of large language models

Bioinformatics. 2024 Mar 29;40(4):btae163. doi: 10.1093/bioinformatics/btae163.

Authors

Vipina K Keloth¹, Yan Hu², Qianqian Xie¹, Xueqing Peng¹, Yan Wang¹, Andrew Zheng³, Melih Selek⁴, Kalpana Raja¹, Chih Hsuan Wei⁵, Qiao Jin⁵, Zhiyong Lu⁵, Qingyu Chen^{1

5}, Hua Xu¹

Affiliations

¹ Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT-06510, United States.
² McWilliams School of Biomedical Informatics, University of Texas Health Science at Houston, Houston, TX-77030, United States.
³ William P. Clements High School, Sugar Land, TX-77479, United States.
⁴ Stephen F. Austin High School, Sugar Land, TX-77498, United States.
⁵ National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD-20894, United States.

Abstract

Motivation: Large Language Models (LLMs) have the potential to revolutionize the field of Natural Language Processing, excelling not only in text generation and reasoning tasks but also in their ability for zero/few-shot learning, swiftly adapting to new tasks with minimal fine-tuning. LLMs have also demonstrated great promise in biomedical and healthcare applications. However, when it comes to Named Entity Recognition (NER), particularly within the biomedical domain, LLMs fall short of the effectiveness exhibited by fine-tuned domain-specific models. One key reason is that NER is typically conceptualized as a sequence labeling task, whereas LLMs are optimized for text generation and reasoning tasks.

Results: We developed an instruction-based learning paradigm that transforms biomedical NER from a sequence labeling task into a generation task. This paradigm is end-to-end and streamlines the training and evaluation process by automatically repurposing pre-existing biomedical NER datasets. We further developed BioNER-LLaMA using the proposed paradigm with LLaMA-7B as the foundational LLM. We conducted extensive testing on BioNER-LLaMA across three widely recognized biomedical NER datasets, consisting of entities related to diseases, chemicals, and genes. The results revealed that BioNER-LLaMA consistently achieved higher F1-scores ranging from 5% to 30% compared to the few-shot learning capabilities of GPT-4 on datasets with different biomedical entities. We show that a general-domain LLM can match the performance of rigorously fine-tuned PubMedBERT models and PMC-LLaMA, biomedical-specific language model. Our findings underscore the potential of our proposed paradigm in developing general-domain LLMs that can rival SOTA performances in multi-task, multi-domain scenarios in biomedical and health applications.

Availability and implementation: Datasets and other resources are available at https://github.com/BIDS-Xu-Lab/BioNER-LLaMA.

MeSH terms

Animals
Camelids, New World*
Deep Learning*
Language
Natural Language Processing

Grants and funding

R01AG078154/NH/NIH HHS/United States