SASC: A simple approach to synthetic cohorts for generating longitudinal observational patient cohorts from COVID-19 clinical data

Patterns (N Y). 2022 Apr 8;3(4):100453. doi: 10.1016/j.patter.2022.100453. Epub 2022 Feb 9.

Abstract

One of the impacts of the coronavirus disease 2019 (COVID-19) pandemic has been a push for researchers to better exploit synthetic data and accelerate the design, analysis, and modeling of clinical trials. The unprecedented clinical efforts caused by COVID-19's emergence will certainly boost future robust and innovative approaches of statistical sciences applied to clinical fields. Here, we report the development of SASC, a simple but efficient approach to generate COVID-19-related synthetic clinical data through a web application. SASC takes basic summary statistics for each group of patients and attempts to generate single variables according to internal correlations. To assess the "reliability" of the results, statistical comparisons with Synthea, a known synthetic patient generator tool, and, more importantly, with clinical data of real COVID-19 patients are provided. The source code and web application are available on GitHub, Zenodo, and Mendeley Data.

Keywords: COVID-19; SASC; VC; clinical trial; real-world data; synthea; virtual cohort.