Combining expert knowledge and knowledge automatically acquired from electronic data sources for continued ontology evaluation and improvement

J Biomed Inform. 2015 Oct:57:42-52. doi: 10.1016/j.jbi.2015.07.014. Epub 2015 Jul 23.

Abstract

Introduction: A common bottleneck during ontology evaluation is knowledge acquisition from domain experts for gold standard creation. This paper contributes a novel semi-automated method for evaluating the concept coverage and accuracy of biomedical ontologies by complementing expert knowledge with knowledge automatically extracted from clinical practice guidelines and electronic health records, which minimizes reliance on expensive domain expertise for gold standards generation.

Methods: We developed a bacterial clinical infectious diseases ontology (BCIDO) to assist clinical infectious disease treatment decision support. Using a semi-automated method we integrated diverse knowledge sources, including publically available infectious disease guidelines from international repositories, electronic health records, and expert-generated infectious disease case scenarios, to generate a compendium of infectious disease knowledge and use it to evaluate the accuracy and coverage of BCIDO.

Results: BCIDO has three classes (i.e., infectious disease, antibiotic, bacteria) containing 593 distinct concepts and 2345 distinct concept relationships. Our semi-automated method generated an ID knowledge compendium consisting of 637 concepts and 1554 concept relationships. Overall, BCIDO covered 79% (504/637) of the concepts and 89% (1378/1554) of the concept relationships in the ID compendium. BCIDO coverage of ID compendium concepts was 92% (121/131) for antibiotic, 80% (205/257) for infectious disease, and 72% (178/249) for bacteria. The low coverage of bacterial concepts in BCIDO was due to a difference in concept granularity between BCIDO and infectious disease guidelines. Guidelines and expert generated scenarios were the richest source of ID concepts and relationships while patient records provided relatively fewer concepts and relationships.

Conclusions: Our semi-automated method was cost-effective for generating a useful knowledge compendium with minimal reliance on domain experts. This method can be useful for continued development and evaluation of biomedical ontologies for better accuracy and coverage.

Keywords: Antibiotic; Bacteria; Evaluation; Infectious disease; Knowledge acquisition; Ontology.

MeSH terms

  • Bacterial Infections
  • Biological Ontologies*
  • Decision Support Systems, Clinical
  • Electronic Health Records*
  • Humans
  • Information Storage and Retrieval*
  • Knowledge*