Data Management for Health Data Reuse: Proposal of a Standard Workflow and a R Tutorial with Jupyter Notebook

Stud Health Technol Inform. 2022 Aug 31:298:82-86. doi: 10.3233/SHTI220912.

Abstract

The data collected in the clinical registries or by data reuse require some modifications in order to suit the research needs. Several common operations are frequently applied to select relevant patients across the cohort, combine data from multiple sources, add new variables if needed and create unique tables depending on the research purpose. We carried out a qualitative survey by conducting semi-structured interviews with 7 experts in data reuse and proposed a standard workflow for health data management. We implemented a R tutorial based on a synthetic data set using Jupyter Notebook for a better understanding of the data management workflow.

Keywords: Data Science; Data management; Data reuse; Education; Programming.

MeSH terms

  • Data Management*
  • Humans
  • Workflow