Towards Developing a Robust Intrusion Detection Model Using Hadoop-Spark and Data Augmentation for IoT Networks

Sensors (Basel). 2022 Oct 12;22(20):7726. doi: 10.3390/s22207726.

Abstract

In recent years, anomaly detection and machine learning for intrusion detection systems have been used to detect anomalies on Internet of Things networks. These systems rely on machine and deep learning to improve the detection accuracy. However, the robustness of the model depends on the number of datasamples available, quality of the data, and the distribution of the data classes. In the present paper, we focused specifically on the amount of data and class imbalanced since both parameters are key in IoT due to the fact that network traffic is increasing exponentially. For this reason, we propose a framework that uses a big data methodology with Hadoop-Spark to train and test multi-class and binary classification with one-vs-rest strategy for intrusion detection using the entire BoT IoT dataset. Thus, we evaluate all the algorithms available in Hadoop-Spark in terms of accuracy and processing time. In addition, since the BoT IoT dataset used is highly imbalanced, we also improve the accuracy for detecting minority classes by generating more datasamples using a Conditional Tabular Generative Adversarial Network (CTGAN). In general, our proposed model outperforms other published models including our previous model. Using our proposed methodology, the F1-score of one of the minority class, i.e., Theft attack was improved from 42% to 99%.

Keywords: BoT-IoT; CTGAN; IoT (internet of things) security; big data framework; hadoop-spark; imbalaced datasets.

Grants and funding

This researchwas partly funded by a grant from the Natural Sciences and Engineering Research Council of Canada, received by Kshirasagar Naik. In addition, it was funded by Cistech Limited and University of Waterloo.