Provenance for Biomedical Ontologies with RDF and Git

Stud Health Technol Inform. 2019 Sep 3:267:230-237. doi: 10.3233/SHTI190832.

Abstract

The German Center for Lung Research (DZL) is a research network with the aim of researching respiratory diseases. In order to enable consortium-wide retrospective research and prospective patient recruitment, we perform data integration into a central data warehouse. The enhancements of the underlying ontology is an ongoing process for which we developed the Collaborative Metadata Repository (CoMetaR) tool. Its technical infrastructure is based on the Resource Description Framework (RDF) for ontology representation and the distributed version control system Git for storage and versioning. Ontology development involves a considerable amount of data curation. Data provenance improves its feasibility and quality. Especially in collaborative metadata development, a comprehensive annotation about "who contributed what, when and why" is essential. Although RDF and Git versioning repositories are commonly used, no existing solution captures metadata provenance information in sufficient detail. We propose an enhanced composition of standardized RDF statements for detailed provenance representation. Additionally, we developed an algorithm that extracts and translates provenance data from the repository into the proposed RDF statements.

Keywords: Biological ontologies; automatic data processing; data curation; metadata; quality improvement.

MeSH terms

  • Biological Ontologies*
  • Data Warehousing
  • Humans
  • Metadata
  • Prospective Studies
  • Retrospective Studies