Developing a common data model approach for DISCOVER CKD: A retrospective, global cohort of real-world patients with chronic kidney disease

PLoS One. 2022 Sep 29;17(9):e0274131. doi: 10.1371/journal.pone.0274131. eCollection 2022.

Abstract

Objectives: To describe a flexible common data model (CDM) approach that can be efficiently tailored to study-specific needs to facilitate pooled patient-level analysis and aggregated/meta-analysis of routinely collected retrospective patient data from disparate data sources; and to detail the application of this CDM approach to the DISCOVER CKD retrospective cohort, a longitudinal database of routinely collected (secondary) patient data of individuals with chronic kidney disease (CKD).

Methods: The flexible CDM approach incorporated three independent, exchangeable components that preceded data mapping and data model implementation: (1) standardized code lists (unifying medical events from different coding systems); (2) laboratory unit harmonization tables; and (3) base cohort definitions. Events between different coding vocabularies were not mapped code-to-code; for each data source, code lists of labels were curated at the entity/event level. A study team of epidemiologists, clinicians, informaticists, and data scientists were included within the validation of each component.

Results: Applying the CDM to the DISCOVER CKD retrospective cohort, secondary data from 1,857,593 patients with CKD were harmonized from five data sources, across three countries, into a discrete database for rapid real-world evidence generation.

Conclusions: This flexible CDM approach facilitates evidence generation from real-world data within the DISCOVER CKD retrospective cohort, providing novel insights into the epidemiology of CKD that may expedite improvements in diagnosis, prognosis, early intervention, and disease management. The adaptable architecture of this CDM approach ensures scalable, fast, and efficient application within other therapy areas to facilitate the combined analysis of different types of secondary data from multiple, heterogeneous sources.

MeSH terms

  • Cohort Studies
  • Databases, Factual
  • Disease Management
  • Humans
  • Renal Insufficiency, Chronic* / diagnosis
  • Renal Insufficiency, Chronic* / epidemiology
  • Retrospective Studies

Grants and funding

This manuscript, including medical writing and editorial support, was funded by AstraZeneca. The sponsor was involved in the study design and collection, analysis and interpretation of data, as well as data checking of information provided in the manuscript. SK is an employee and stockholder of AstraZeneca. MA is an employee of AstraZeneca. GJ was an employee of AstraZeneca at the time of the study. Ultimate responsibility for opinions, conclusions and data interpretation lies with the authors.