Selective UMLS knowledge infusion for biomedical question answering

Hyeryun Park; Jiye Son; Jeongwon Min; Jinwook Choi

doi:10.1038/s41598-023-41423-8

Selective UMLS knowledge infusion for biomedical question answering

Sci Rep. 2023 Aug 30;13(1):14214. doi: 10.1038/s41598-023-41423-8.

Authors

Hyeryun Park^{1

2}, Jiye Son^{1

2}, Jeongwon Min^{1

2}, Jinwook Choi^{3

4}

Affiliations

¹ Interdisciplinary Program for Bioengineering, Seoul National University Graduate School, Seoul, Republic of Korea.
² Integrated Major in Innovative Medical Science, Seoul National University Graduate School, Seoul, Republic of Korea.
³ Integrated Major in Innovative Medical Science, Seoul National University Graduate School, Seoul, Republic of Korea. jinchoi@snu.ac.kr.
⁴ Department of Biomedical Engineering, College of Medicine, Seoul National University, 103, Daehak-ro, Jongno-gu, Seoul, Republic of Korea. jinchoi@snu.ac.kr.

Abstract

One of the artificial intelligence applications in the biomedical field is knowledge-intensive question-answering. As domain expertise is particularly crucial in this field, we propose a method for efficiently infusing biomedical knowledge into pretrained language models, ultimately targeting biomedical question-answering. Transferring all semantics of a large knowledge graph into the entire model requires too many parameters, increasing computational cost and time. We investigate an efficient approach that leverages adapters to inject Unified Medical Language System knowledge into pretrained language models, and we question the need to use all semantics in the knowledge graph. This study focuses on strategies of partitioning knowledge graph and either discarding or merging some for more efficient pretraining. According to the results of three biomedical question answering finetuning datasets, the adapters pretrained on semantically partitioned group showed more efficient performance in terms of evaluation metrics, required parameters, and time. The results also show that discarding groups with fewer concepts is a better direction for small datasets, and merging these groups is better for large dataset. Furthermore, the metric results show a slight improvement, demonstrating that the adapter methodology is rather insensitive to the group formulation.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Artificial Intelligence*
Benchmarking
Knowledge
Language
Unified Medical Language System*