ODMedit: uniform semantic annotation for data integration in medicine based on a public metadata repository

BMC Med Res Methodol. 2016 Jun 1:16:65. doi: 10.1186/s12874-016-0164-9.

Abstract

Background: The volume and complexity of patient data - especially in personalised medicine - is steadily increasing, both regarding clinical data and genomic profiles: Typically more than 1,000 items (e.g., laboratory values, vital signs, diagnostic tests etc.) are collected per patient in clinical trials. In oncology hundreds of mutations can potentially be detected for each patient by genomic profiling. Therefore data integration from multiple sources constitutes a key challenge for medical research and healthcare.

Methods: Semantic annotation of data elements can facilitate to identify matching data elements in different sources and thereby supports data integration. Millions of different annotations are required due to the semantic richness of patient data. These annotations should be uniform, i.e., two matching data elements shall contain the same annotations. However, large terminologies like SNOMED CT or UMLS don't provide uniform coding. It is proposed to develop semantic annotations of medical data elements based on a large-scale public metadata repository. To achieve uniform codes, semantic annotations shall be re-used if a matching data element is available in the metadata repository.

Results: A web-based tool called ODMedit ( https://odmeditor.uni-muenster.de/ ) was developed to create data models with uniform semantic annotations. It contains ~800,000 terms with semantic annotations which were derived from ~5,800 models from the portal of medical data models (MDM). The tool was successfully applied to manually annotate 22 forms with 292 data items from CDISC and to update 1,495 data models of the MDM portal.

Conclusion: Uniform manual semantic annotation of data models is feasible in principle, but requires a large-scale collaborative effort due to the semantic richness of patient data. A web-based tool for these annotations is available, which is linked to a public metadata repository.

Keywords: Data integration; ODM; Personalised medicine; Semantic annotation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Curation
  • Datasets as Topic / standards*
  • Electronic Health Records
  • Humans
  • Metadata*
  • Reference Standards
  • Semantics
  • Software*