Research-ready data: the C-Surv data model

Eur J Epidemiol. 2023 Feb;38(2):179-187. doi: 10.1007/s10654-022-00916-y. Epub 2023 Jan 7.

Abstract

Research-ready data (data curated to a defined standard) increase scientific opportunity and rigour by integrating the data environment. The development of research platforms has highlighted the value of research-ready data, particularly for multi-cohort analyses. Following stakeholder consultation, a standard data model (C-Surv) optimised for data discovery, was developed using data from 5 population and clinical cohort studies. The model uses a four-tier nested structure based on 18 data themes selected according to user behaviour or technology. Standard variable naming conventions are applied to uniquely identify variables within the context of longitudinal studies. The data model was used to develop a harmonised dataset for 11 cohorts. This dataset populated the Cohort Explorer data discovery tool for assessing the feasibility of an analysis prior to making a data access request. Data preparation times were compared between cohort specific data models and C-Surv.It was concluded that adopting a common data model as a data standard for the discovery and analysis of research cohort data offers multiple benefits.

MeSH terms

  • Cohort Studies
  • Datasets as Topic*
  • Humans
  • Longitudinal Studies*
  • Models, Theoretical*