The Use of Ensemble Models for Multiple Class and Binary Class Classification for Improving Intrusion Detection Systems

Celestine Iwendi; Suleman Khan; Joseph Henry Anajemba; Mohit Mittal; Mamdouh Alenezi; Mamoun Alazab

doi:10.3390/s20092559

The Use of Ensemble Models for Multiple Class and Binary Class Classification for Improving Intrusion Detection Systems

Sensors (Basel). 2020 Apr 30;20(9):2559. doi: 10.3390/s20092559.

Authors

Celestine Iwendi¹, Suleman Khan², Joseph Henry Anajemba³, Mohit Mittal⁴, Mamdouh Alenezi⁵, Mamoun Alazab⁶

Affiliations

¹ Department of Electronics, BCC of Central South University of Forestry and Tech, Changsha 410004, China.
² Department of Computer Science, Air University, Islamabad 44000, Pakistan.
³ Department of Communication Engineering, Hohai University, Changzhou 211100, China.
⁴ Department of Information Science and Engineering, Kyoto Sangyo University, Kyoto 603-8555, Japan.
⁵ College of Computer and Information Sciences, Prince Sultan University, Riyadh 12435, Saudi Arabia.
⁶ College of Engineering, IT and Environment, Charles Darwin University, Casuarina NT 0800, Australia.

Abstract

The pursuit to spot abnormal behaviors in and out of a network system is what led to a system known as intrusion detection systems for soft computing besides many researchers have applied machine learning around this area. Obviously, a single classifier alone in the classifications seems impossible to control network intruders. This limitation is what led us to perform dimensionality reduction by means of correlation-based feature selection approach (CFS approach) in addition to a refined ensemble model. The paper aims to improve the Intrusion Detection System (IDS) by proposing a CFS + Ensemble Classifiers (Bagging and Adaboost) which has high accuracy, high packet detection rate, and low false alarm rate. Machine Learning Ensemble Models with base classifiers (J48, Random Forest, and Reptree) were built. Binary classification, as well as Multiclass classification for KDD99 and NSLKDD datasets, was done while all the attacks were named as an anomaly and normal traffic. Class labels consisted of five major attacks, namely Denial of Service (DoS), Probe, User-to-Root (U2R), Root to Local attacks (R2L), and Normal class attacks. Results from the experiment showed that our proposed model produces 0 false alarm rate (FAR) and 99.90% detection rate (DR) for the KDD99 dataset, and 0.5% FAR and 98.60% DR for NSLKDD dataset when working with 6 and 13 selected features.

Keywords: artificial intelligence; ensemble methods; false positive rate; feature selection; intrusion detection system; machine learning.