An improved X-means and isolation forest based methodology for network traffic anomaly detection

PLoS One. 2022 Jan 31;17(1):e0263423. doi: 10.1371/journal.pone.0263423. eCollection 2022.

Abstract

Anomaly detection in network traffic is becoming a challenging task due to the complexity of large-scale networks and the proliferation of various social network applications. In the actual industrial environment, only recently obtained unlabelled data can be used as the training set. The accuracy of the abnormal ratio in the training set as prior knowledge has a great influence on the performance of the commonly used unsupervised algorithms. In this study, an anomaly detection algorithm based on X-means and iForest is proposed, named X-iForest, which clusters the standard Euclidean distance between the abnormal points and the normal cluster centre to achieve secondary filtering by using X-means. We compared X-iForest with seven mainstream unsupervised algorithms in terms of the AUC and anomaly detection rates. A large number of experiments showed that X-iForest has notable advantages over other algorithms and can be well applied to anomaly detection of large-scale network traffic data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Area Under Curve
  • Computer Simulation
  • Neural Networks, Computer*

Grants and funding

Grant numbers: 2016B010124012; Weihong Cai reveived award; Full name of funder:Science and Technology Planning Project of Guangdong Province; Grantees played a role in study design, data collection and analysis, publication decisions and manuscript preparation. Grant numbers: 2019B010116001; Weihong Cai reveived award; Full name of funder:Science and Technology Planning Project of Guangdong Province; Grantees played a role in study design, data collection and analysis, publication decisions and manuscript preparation.