Bayesian approach to incorporating different types of biomedical knowledge bases into information retrieval systems for clinical decision support in precision medicine

J Biomed Inform. 2019 Oct:98:103238. doi: 10.1016/j.jbi.2019.103238. Epub 2019 Jul 10.

Abstract

By providing clinicians with information regarding treatment options for molecular sub-types of complex diseases with genetic origin, such as cancer, information retrieval (IR) systems play an important role in precision medicine. In this paper, we propose Bayesian Precision Medicine (BPM), a novel probabilistic framework for query expansion in information retrieval systems for Clinical Decision Support (CDS) in Precision Medicine (PM). Such systems can assist clinicians with selecting personalized treatment of complex diseases based on the patients' genomic data, such as gene mutations. In particular, we focus on a clinical decision support scenario in which clinicians provide two types of information in their queries: (1) short description of a patient's case, which may contain information regarding the type of cancer that a patient has as well as symptoms and demographics, and (2) gene mutations, which may contain gene names, mutation code and type of mutation. The goal of an IR system in this scenario is to rank biomedical articles from a large collection, such as the MEDLINE, based on their relevance to the provided query. One of the main challenges faced by IR systems in this scenario is semantic matching of heterogeneous information (gene names, medical terminology and other query keywords) in queries and relevant biomedical articles. To address this challenge, we propose a probabilistic framework that enables mapping gene mutations provided in a given query onto the biomedical concepts that are related to the entire query and can be effectively utilized for query expansion. The BPM obtains candidate query expansion concepts from biomedical knowledge bases, the Unified Medical Language System (UMLS) and the Drug-Gene Interaction Database (DGIdb), as well as the top-ranked MEDLINE articles retrieved for the original query. The BPM then utilizes information from the Catalog of Somatic Mutations in Cancer (COSMIC) and co-occurrence statistics in MEDLINE to assess the relatedness of candidate query expansion concepts to gene mutations and other information provided in a query. Experimental evaluation of the BPM was conducted on a large subset of MEDLINE articles as well as abstracts from the American Association for Cancer Research (AACR) and American Society of Clinical Oncology (ASCO) proceedings. Experimental results on a publicly available benchmark provided by the 2017 TREC precision medicine track indicate that the proposed probabilistic framework is effective at utilizing both genomic and textual information in queries to improve the accuracy of IR systems for CDS in PM through query expansion.

Keywords: Bayesian inference; Clinical decision support; Information retrieval; Knowledge bases; Precision medicine.

MeSH terms

  • Abstracting and Indexing
  • Bayes Theorem
  • Bibliometrics
  • Data Mining / methods
  • Decision Support Systems, Clinical*
  • Humans
  • Information Systems
  • Knowledge Bases
  • MEDLINE
  • Medical Informatics / methods
  • Medical Oncology
  • Neoplasms / therapy
  • Precision Medicine / methods*
  • Probability
  • Societies, Medical
  • Unified Medical Language System