An Ensemble-Based Scalable Approach for Intrusion Detection Using Big Data Framework

Big Data. 2021 Aug;9(4):303-321. doi: 10.1089/big.2020.0201. Epub 2021 Jul 16.

Abstract

In this study, we set up a scalable framework for large-scale data processing and analytics using the big data framework. The popular classification methods are implemented, tuned, and evaluated by using intrusion datasets. The objective is to select the best classifier after optimizing the hyper-parameters. We observed that the decision tree (DT) approach outperforms compared with other methods in terms of classification accuracy, fast training time, and improved average prediction rate. Therefore, it is selected as a base classifier in our proposed ensemble approach to study class imbalance. As the intrusion datasets are imbalanced, most of the classification techniques are biased toward the majority class. The misclassification rate is more in the case of the minority class. An ensemble-based method is proposed by using K-Means, RUSBoost, and DT approaches to mitigate the class imbalance problem; empirically investigate the impact of class imbalance on classification approaches' performance; and compare the result by using popular performance metrics such as Balanced Accuracy, Matthews Correlation Coefficient, and F-Measure, which are more suitable for the assessment of imbalanced datasets.

Keywords: Decision Tree; K-NN; SVM; big data analytic; ensemble methods; intrusion detection.

MeSH terms

  • Algorithms*
  • Big Data*