Querying semantic catalogues of biomedical databases

J Biomed Inform. 2023 Jan:137:104272. doi: 10.1016/j.jbi.2022.104272. Epub 2022 Dec 20.

Abstract

Background: Secondary use of health data is a valuable source of knowledge that boosts observational studies, leading to important discoveries in the medical and biomedical sciences. The fundamental guiding principle for performing a successful observational study is the research question and the approach in advance of executing a study. However, in multi-centre studies, finding suitable datasets to support the study is challenging, time-consuming, and sometimes impossible without a deep understanding of each dataset.

Methods: We propose a strategy for retrieving biomedical datasets of interest that were semantically annotated, using an interface built by applying a methodology for transforming natural language questions into formal language queries. The advantages of creating biomedical semantic data are enhanced by using natural language interfaces to issue complex queries without manipulating a logical query language.

Results: Our methodology was validated using Alzheimer's disease datasets published in a European platform for sharing and reusing biomedical data. We converted data to semantic information format using biomedical ontologies in everyday use in the biomedical community and published it as a FAIR endpoint. We have considered natural language questions of three types: single-concept questions, questions with exclusion criteria, and multi-concept questions. Finally, we analysed the performance of the question-answering module we used and its limitations. The source code is publicly available at https://bioinformatics-ua.github.io/BioKBQA/.

Conclusion: We propose a strategy for using information extracted from biomedical data and transformed into a semantic format using open biomedical ontologies. Our method uses natural language to formulate questions to be answered by this semantic data without the direct use of formal query languages.

Keywords: Biomedical data; Information extraction; Knowledge bases; Linked data; Natural language interfaces; Question answering; Semantic data.

Publication types

  • Observational Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Databases, Factual
  • Language
  • Natural Language Processing*
  • Semantics*
  • Software