A data-centric way to improve entity linking in knowledge-based question answering

Shuo Liu; Gang Zhou; Yi Xia; Hao Wu; Zhufeng Li

doi:10.7717/peerj-cs.1233

A data-centric way to improve entity linking in knowledge-based question answering

PeerJ Comput Sci. 2023 Feb 9:9:e1233. doi: 10.7717/peerj-cs.1233. eCollection 2023.

Authors

Shuo Liu¹, Gang Zhou¹, Yi Xia¹, Hao Wu¹, Zhufeng Li¹

Affiliation

¹ Information Engineering University, Zhengzhou, Henan, China.

Abstract

Entity linking in knowledge-based question answering (KBQA) is intended to construct a mapping relation between a mention in a natural language question and an entity in the knowledge base. Most research in entity linking focuses on long text, but entity linking in open domain KBQA is more concerned with short text. Many recent models have tried to extract the features of raw data by adjusting the neural network structure. However, the models only perform well with several datasets. We therefore concentrate on the data rather than the model itself and created a model DME (Domain information Mining and Explicit expressing) to extract domain information from short text and append it to the data. The entity linking model will be enhanced by training with DME-processed data. Besides, we also developed a novel negative sampling approach to make the model more robust. We conducted experiments using the large Chinese open source benchmark KgCLUE to assess model performance with DME-processed data. The experiments showed that our approach can improve entity linking in the baseline models without the need to change their structure and our approach is demonstrably transferable to other datasets.

Keywords: Natural language processing; Negative sampling; Entity linking; Knowledge-based question answering.

Grants and funding

This research is supported by the Science and Technology Research Program of the Department of Science and Technology of Henan Province (approval No.: 222102210081). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.