Automatic background knowledge selection for matching biomedical ontologies

PLoS One. 2014 Nov 7;9(11):e111226. doi: 10.1371/journal.pone.0111226. eCollection 2014.

Abstract

Ontology matching is a growing field of research that is of critical importance for the semantic web initiative. The use of background knowledge for ontology matching is often a key factor for success, particularly in complex and lexically rich domains such as the life sciences. However, in most ontology matching systems, the background knowledge sources are either predefined by the system or have to be provided by the user. In this paper, we present a novel methodology for automatically selecting background knowledge sources for any given ontologies to match. This methodology measures the usefulness of each background knowledge source by assessing the fraction of classes mapped through it over those mapped directly, which we call the mapping gain. We implemented this methodology in the AgreementMakerLight ontology matching framework, and evaluate it using the benchmark biomedical ontology matching tasks from the Ontology Alignment Evaluation Initiative (OAEI) 2013. In each matching problem, our methodology consistently identified the sources of background knowledge that led to the highest improvements over the baseline alignment (i.e., without background knowledge). Furthermore, our proposed mapping gain parameter is strongly correlated with the F-measure of the produced alignments, thus making it a good estimator for ontology matching techniques based on background knowledge.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Animals
  • Automation
  • Biological Ontologies*
  • Humans
  • Internet*
  • Mice
  • Semantics

Grants and funding

DF, CP, ES and FMC were supported by the 'Fundacao para a Ciencia e Tecnologia' (http://www.fct.pt/) through the SOMER project (PTDC/EIA-EIA/119119/2010) and the multi-annual funding program to LASIGE, and by the European Commission (http://ec.europa.eu) through the BiobankCloud project under the Seventh Framework Programme (grant #317871). IFC was supported by the National Science Foundation (http://www.nsf.gov/) through awards CCF-1331800, IIS-1213013, IIS-1143926 and IIS-0812258, and by a Civic Engagement Research Fund Award from the Institute for Policy and Civic Engagement of the University of Illinois at Chicago (http://www.uic.edu/cuppa/ipce/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.