CoDR: Correlation-based Data Reduction Scheme for Efficient Gathering of Heterogeneous Driving Data

Sensors (Basel). 2020 Mar 17;20(6):1677. doi: 10.3390/s20061677.

Abstract

A variety of deep learning techniques are actively employed for advanced driver assistance systems, which in turn require gathering lots of heterogeneous driving data, such as traffic conditions, driver behavior, vehicle status and location information. However, these different types of driving data easily become more than tens of GB per day, forming a significant hurdle due to the storage and network cost. To address this problem, this paper proposes a novel scheme, called CoDR, which can reduce data volume by considering the correlations among heterogeneous driving data. Among heterogeneous datasets, CoDR first chooses one set as a pivot data. Then, according to the objective of data collection, it identifies data ranges relevant to the objective from the pivot dataset. Finally, it investigates correlations among sets, and reduces data volume by eliminating irrelevant data from not only the pivot set but also other remaining datasets. CoDR gathers four heterogeneous driving datasets: two videos for front view and driver behavior, OBD-II and GPS data. We show that CoDR decreases data volume by up to 91%. We also present diverse analytical results that reveal the correlations among the four datasets, which can be exploited usefully for edge computing to reduce data volume on the spot.

Keywords: correlation; data reduction; drowsiness detection; heterogeneous driving data; implementation; intelligent data analysis.