ELI: an IoT-aware big data pipeline with data curation and data quality

PeerJ Comput Sci. 2023 Oct 2:9:e1605. doi: 10.7717/peerj-cs.1605. eCollection 2023.

Abstract

The complexity of analysing data from IoT sensors requires the use of Big Data technologies, posing challenges such as data curation and data quality assessment. Not facing both aspects potentially can lead to erroneous decision-making (i.e., processing incorrectly treated data, introducing errors into processes, causing damage or increasing costs). This article presents ELI, an IoT-based Big Data pipeline for developing a data curation process and assessing the usability of data collected by IoT sensors in both offline and online scenarios. We propose the use of a pipeline that integrates data transformation and integration tools and a customisable decision model based on the Decision Model and Notation (DMN) to evaluate the data quality. Our study emphasises the importance of data curation and quality to integrate IoT information by identifying and discarding low-quality data that obstruct meaningful insights and introduce errors in decision making. We evaluated our approach in a smart farm scenario using agricultural humidity and temperature data collected from various types of sensors. Moreover, the proposed model exhibited consistent results in offline and online (stream data) scenarios. In addition, a performance evaluation has been developed, demonstrating its effectiveness. In summary, this article contributes to the development of a usable and effective IoT-based Big Data pipeline with data curation capabilities and assessing data usability in both online and offline scenarios. Additionally, it introduces customisable decision models for measuring data quality across multiple dimensions.

Keywords: Big data pipeline; Data curation; Data quality; Internet of Things; Sensors.

Grants and funding

This work has been funded by the projects AETHER-US (PID2020-112540RB-C44/AEI/10.13039/501100011033) and ALBA-US (TED2021-130355B-C32) by MCIN/AEI/10.13039/501100011033, COPERNICA (P20_01224) and METAMORFOSIS (US-1381375). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.