Lessons learned to enable question answering on knowledge graphs extracted from scientific publications: A case study on the coronavirus literature

J Biomed Inform. 2023 Jun:142:104382. doi: 10.1016/j.jbi.2023.104382. Epub 2023 May 6.

Abstract

The article presents a workflow to create a question-answering system whose knowledge base combines knowledge graphs and scientific publications on coronaviruses. It is based on the experience gained in modeling evidence from research articles to provide answers to questions in natural language. The work contains best practices for acquiring scientific publications, tuning language models to identify and normalize relevant entities, creating representational models based on probabilistic topics, and formalizing an ontology that describes the associations between domain concepts supported by the scientific literature. All the resources generated in the domain of coronavirus are available openly as part of the Drugs4COVID initiative, and can be (re)-used independently or as a whole. They can be exploited by scientific communities conducting research related to SARS-CoV-2/COVID-19 and also by therapeutic communities, laboratories, etc., wishing to find and understand relationships between symptoms, drugs, active ingredients and their documentary evidence.

Keywords: Evidences; Knowledge graphs; Ontology; Question-answering.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • COVID-19*
  • Humans
  • Pattern Recognition, Automated
  • Publications
  • SARS-CoV-2