PVS-GEN: Systematic Approach for Universal Synthetic Data Generation Involving Parameterization, Verification, and Segmentation

Sensors (Basel). 2024 Jan 2;24(1):266. doi: 10.3390/s24010266.

Abstract

Synthetic data generation addresses the challenges of obtaining extensive empirical datasets, offering benefits such as cost-effectiveness, time efficiency, and robust model development. Nonetheless, synthetic data-generation methodologies still encounter significant difficulties, including a lack of standardized metrics for modeling different data types and comparing generated results. This study introduces PVS-GEN, an automated, general-purpose process for synthetic data generation and verification. The PVS-GEN method parameterizes time-series data with minimal human intervention and verifies model construction using a specific metric derived from extracted parameters. For complex data, the process iteratively segments the empirical dataset until an extracted parameter can reproduce synthetic data that reflects the empirical characteristics, irrespective of the sensor data type. Moreover, we introduce the PoR metric to quantify the quality of the generated data by evaluating its time-series characteristics. Consequently, the proposed method can automatically generate diverse time-series data that covers a wide range of sensor types. We compared PVS-GEN with existing synthetic data-generation methodologies, and PVS-GEN demonstrated a superior performance. It generated data with a similarity of up to 37.1% across multiple data types and by 19.6% on average using the proposed metric, irrespective of the data type.

Keywords: IoT data generation; possibility of reproducibility; synthetic data generation; time-series sensor data; time-series synthesis.