Development of a REDCap-based workflow for high-volume relational data analysis on real-time data in a medical department using open source software

Comput Methods Programs Biomed. 2022 Nov:226:107111. doi: 10.1016/j.cmpb.2022.107111. Epub 2022 Sep 6.

Abstract

Background/aim: The current availability of large volumes of clinical data has provided medical departments with the opportunity for large-scale analyses, but it has also brought forth the need for an effective strategy of data-storage and data-analysis that is both technically feasible and economically sustainable in the context of limited resources and manpower. Therefore, the aim of this study was to develop a widely-usable data-collection and data-analysis workflow that could be applied in medical departments to perform high-volume relational data analysis on real-time data.

Methods: A sample project, based on a research database on prostate-specific-membrane-antigen/positron-emission-tomography scans performed in prostate cancer patients at our department, was used to develop a new workflow for data-collection and data-analysis. A checklist of requirements for a successful data-collection/analysis strategy, based on shared clinical research experience, was used as reference standard. Software libraries were selected based on widespread availability, reliability, cost, and technical expertise of the research team (REDCap-v11.0.0 for collaborative data-collection, Python-v3.8.5 for data retrieval and SQLite-v3.31.1 for data storage). The primary objective of this study was to develop and implement a workflow to: a) easily store large volumes of structured data into a relational database, b) perform scripted analyses on relational data retrieved in real-time from the database. The secondary objective was to enhance the strategy cost-effectiveness by using open-source/cost-free software libraries.

Results: A fully working data strategy was developed and successfully applied to a sample research project. The REDCap platform provided a remote and secure method to collaboratively collect large volumes of standardized relational data, with low technical difficulty and role-based access-control. A Python software was coded to retrieve live data through the REDCap-API and persist them to an SQLite database, preserving data-relationships. The SQL-language enabled complex datasets retrieval, while Python allowed for scripted data computation and analysis. Only cost-free software libraries were used and the sample code was made available through a GitHub repository.

Conclusions: A REDCap-based data-collection and data-analysis workflow, suitable for high-volume relational data-analysis on live data, was developed and successfully implemented using open-source software.

Keywords: Data analysis; Data collection; Database management systems; Workflow.

MeSH terms

  • Data Analysis*
  • Databases, Factual
  • Humans
  • Reproducibility of Results
  • Software*
  • Workflow