Comprehensive analysis and recommendation of feature evaluation measures for intrusion detection

Heliyon. 2020 Jul 9;6(7):e04262. doi: 10.1016/j.heliyon.2020.e04262. eCollection 2020 Jul.

Abstract

The revolutionary advances in network technologies have spearheaded the design of advanced cyberattacks to surpass traditional security defense with dreadful consequences. Recently, Intrusion Detection System (IDS) is considered as a pivotal element in network security infrastructures to achieve solid line of protection against cyberattacks. The prime challenges presented to IDS are curse of high dimensionality and class imbalance that tends to increase the detection time and degrade the efficiency of IDS. As a result, feature selection plays an important role in enabling to identify the most significant features for intrusion detection. Although, several feature evaluation measures are being proposed for feature selection in literature, there is no consensus on which measures are best for intrusion detection. Therein, this work aims at recommending the most appropriate feature evaluation measure for building an efficient IDS. In this direction, four filter-based feature evaluation measures that stem from different theories such as Consistency, Correlation, Information and Distance are investigated for their potential implications in enhancing the detection ability of IDS model for different classes of attacks. Along with this, the influence of the selected features on classification accuracy of an IDS model is analyzed using four different categories of classifiers namely, K-nearest neighbors (KNN), Random Forest (RF), Support Vector Machine (SVM) and Deep Belief Network (DBN). Finally, a two-step statistical significance test is conducted on the experimental results to determine which feature evaluation measure contributes statistically significant difference in IDS performance. All the experimental comparisons are performed on two benchmark intrusion detection datasets, NSL-KDD and UNSW-NB15. In these experiments, consistency measure has best influenced the IDS model in improving the detection ability with regard to detection rate (DR), false alarm rate (FAR), kappa statistics (KS) and identifying the most significant features for intrusion detection. Also, from the analysis results, it is revealed that RF is the ideal classifier to be used in conjunction with any of these four feature evaluation measures to achieve better detection accuracy than others. From the statistical results, we recommend the use of consistency measure for designing an efficient IDS in terms of DR and FAR.

Keywords: Computer science; Consistency; Correlation; Cybersecurity; Deep belief network; Detection engine; Distance; Feature selection; Information gain; Intrusion detection; Response engine.