Definition, Composition, and Harmonization of Core Datasets Within the German Center for Lung Research

Stud Health Technol Inform. 2023 May 18:302:696-700. doi: 10.3233/SHTI230242.

Abstract

Core datasets are the composition of essential data items for a certain research scope. As they state commonalities between heterogeneous data collections, they serve as a basis for cross-site and cross-disease research. Therefore, researchers at the national and international levels have addressed the problem of missing core datasets. The German Center for Lung Research (DZL) comprises five sites and eight disease areas and aims to gain further scientific knowledge by continuously promoting collaborations. In this study, we elaborated a methodology for defining core datasets in the field of lung health science. Additionally, through support of domain experts, we have utilized our method and compiled core datasets for each DZL disease area and a general core dataset for lung research. All included data items were annotated with metadata and where possible they were assigned references to international classification systems. Our findings will support future scientific collaborations and meaningful data collections.

Keywords: Data collection; controlled vocabulary; datasets as topic; quality indicators; respiratory system.

MeSH terms

  • Data Collection
  • Lung*
  • Metadata*