Data Randomization and Cluster-Based Partitioning for Botnet Intrusion Detection

Omar Y Al-Jarrah; Omar Alhussein; Paul D Yoo; Sami Muhaidat; Kamal Taha; Kwangjo Kim

doi:10.1109/TCYB.2015.2490802

Data Randomization and Cluster-Based Partitioning for Botnet Intrusion Detection

IEEE Trans Cybern. 2016 Aug;46(8):1796-806. doi: 10.1109/TCYB.2015.2490802. Epub 2015 Oct 30.

Authors

Omar Y Al-Jarrah, Omar Alhussein, Paul D Yoo, Sami Muhaidat, Kamal Taha, Kwangjo Kim

PMID: 26540724
DOI: 10.1109/TCYB.2015.2490802

Abstract

Botnets, which consist of remotely controlled compromised machines called bots, provide a distributed platform for several threats against cyber world entities and enterprises. Intrusion detection system (IDS) provides an efficient countermeasure against botnets. It continually monitors and analyzes network traffic for potential vulnerabilities and possible existence of active attacks. A payload-inspection-based IDS (PI-IDS) identifies active intrusion attempts by inspecting transmission control protocol and user datagram protocol packet's payload and comparing it with previously seen attacks signatures. However, the PI-IDS abilities to detect intrusions might be incapacitated by packet encryption. Traffic-based IDS (T-IDS) alleviates the shortcomings of PI-IDS, as it does not inspect packet payload; however, it analyzes packet header to identify intrusions. As the network's traffic grows rapidly, not only the detection-rate is critical, but also the efficiency and the scalability of IDS become more significant. In this paper, we propose a state-of-the-art T-IDS built on a novel randomized data partitioned learning model (RDPLM), relying on a compact network feature set and feature selection techniques, simplified subspacing and a multiple randomized meta-learning technique. The proposed model has achieved 99.984% accuracy and 21.38 s training time on a well-known benchmark botnet dataset. Experiment results demonstrate that the proposed methodology outperforms other well-known machine-learning models used in the same detection task, namely, sequential minimal optimization, deep neural network, C4.5, reduced error pruning tree, and randomTree.