A Quality Control Methodology for Heterogeneous Vehicular Data Streams

Sensors (Basel). 2022 Feb 18;22(4):1550. doi: 10.3390/s22041550.

Abstract

The rapid evolution of sensors and communication technologies has led to the production and transfer of mass data streams from vehicles either inside their electronic units or to the outside world using the internet infrastructure. The "outside world", in most cases, consists of third-party applications, such as fleet or traffic management control centers, which utilize vehicular data for reporting and monitoring functionalities. Such applications, in most cases, in order to facilitate their needs, require the exchange and processing of vast amounts of data which can be handled by the so-called Big Data technologies. The purpose of this study is to present a hybrid platform suitable for data collection, storing and analysis enhanced with quality control actions. In particular, the collected data contain various formats originating from different vehicle sensors and are stored in the aforementioned platform in a continuous way. The stored data in this platform must be checked in order to determine and validate them in terms of quality. To do so, certain actions, such as missing values checks, format checks, range checks, etc., must be carried out. The results of the quality control functions are presented herein, and useful conclusions are drawn in order to avoid possible data quality problems which may occur in further analysis and use of the data, e.g., for training of artificial intelligence models.

Keywords: big data; data quality control; data streams; vehicle sensors.