A passage retrieval method based on probabilistic information retrieval model and UMLS concepts in biomedical question answering

J Biomed Inform. 2017 Apr:68:96-103. doi: 10.1016/j.jbi.2017.03.001. Epub 2017 Mar 7.

Abstract

Background and objective: Passage retrieval, the identification of top-ranked passages that may contain the answer for a given biomedical question, is a crucial component for any biomedical question answering (QA) system. Passage retrieval in open-domain QA is a longstanding challenge widely studied over the last decades. However, it still requires further efforts in biomedical QA. In this paper, we present a new biomedical passage retrieval method based on Stanford CoreNLP sentence/passage length, probabilistic information retrieval (IR) model and UMLS concepts.

Methods: In the proposed method, we first use our document retrieval system based on PubMed search engine and UMLS similarity to retrieve relevant documents to a given biomedical question. We then take the abstracts from the retrieved documents and use Stanford CoreNLP for sentence splitter to make a set of sentences, i.e., candidate passages. Using stemmed words and UMLS concepts as features for the BM25 model, we finally compute the similarity scores between the biomedical question and each of the candidate passages and keep the N top-ranked ones.

Results: Experimental evaluations performed on large standard datasets, provided by the BioASQ challenge, show that the proposed method achieves good performances compared with the current state-of-the-art methods. The proposed method significantly outperforms the current state-of-the-art methods by an average of 6.84% in terms of mean average precision (MAP).

Conclusion: We have proposed an efficient passage retrieval method which can be used to retrieve relevant passages in biomedical QA systems with high mean average precision.

Keywords: Biomedical informatics; Biomedical passage retieval; Biomedical question answering system; Natural language processing; Probabilistic information retrieval model; Unified medical language system.

MeSH terms

  • Information Storage and Retrieval*
  • Models, Statistical
  • Natural Language Processing*
  • PubMed*
  • Unified Medical Language System*