Common data model for COVID-19 datasets

Philipp Wegner; Geena Mariya Jose; Vanessa Lage-Rupprecht; Sepehr Golriz Khatami; Bide Zhang; Stephan Springstubbe; Marc Jacobs; Thomas Linden; Cindy Ku; Bruce Schultz; Martin Hofmann-Apitius; Alpha Tom Kodamullil; COPERIMOplus Consortium

doi:10.1093/bioinformatics/btac651

Common data model for COVID-19 datasets

Bioinformatics. 2022 Dec 13;38(24):5466-5468. doi: 10.1093/bioinformatics/btac651.

Affiliations

¹ Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, 53757, Germany.
² Causality Biomodels, Kinfra Hi-Tech Park, Cochin, Kerala 683503, India.
³ Bonn-Aachen International Center for IT (B-IT), Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn 53115, Germany.

Abstract

Motivation: A global medical crisis like the coronavirus disease 2019 (COVID-19) pandemic requires interdisciplinary and highly collaborative research from all over the world. One of the key challenges for collaborative research is a lack of interoperability among various heterogeneous data sources. Interoperability, standardization and mapping of datasets are necessary for data analysis and applications in advanced algorithms such as developing personalized risk prediction modeling.

Results: To ensure the interoperability and compatibility among COVID-19 datasets, we present here a common data model (CDM) which has been built from 11 different COVID-19 datasets from various geographical locations. The current version of the CDM holds 4639 data variables related to COVID-19 such as basic patient information (age, biological sex and diagnosis) as well as disease-specific data variables, for example, Anosmia and Dyspnea. Each of the data variables in the data model is associated with specific data types, variable mappings, value ranges, data units and data encodings that could be used for standardizing any dataset. Moreover, the compatibility with established data standards like OMOP and FHIR makes the CDM a well-designed CDM for COVID-19 data interoperability.

Availability and implementation: The CDM is available in a public repo here: https://github.com/Fraunhofer-SCAI-Applied-Semantics/COVID-19-Global-Model.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
COVID-19*
Humans
Pandemics

Grants and funding

Fraunhofer 'Internal Programs Fraunhofer vs Corona'