Malicious traffic detection on sampled network flow data with novelty-detection-based models

Sci Rep. 2023 Sep 18;13(1):15446. doi: 10.1038/s41598-023-42618-9.

Abstract

Cyber-attacks are a major problem for users, businesses, and institutions. Classical anomaly detection techniques can detect malicious traffic generated in a cyber-attack by analyzing individual network packets. However, routers that manage large traffic loads can only examine some packets. These devices often use lightweight flow-based protocols to collect network statistics. Analyzing flow data also allows for detecting malicious network traffic. But even gathering flow data has a high computational cost, so routers usually apply a sampling rate to generate flows. This sampling reduces the computational load on routers, but much information is lost. This work aims to demonstrate that malicious traffic can be detected even on flow data collected with a sampling rate of 1 out of 1,000 packets. To do so, we evaluate anomaly-detection-based models using synthetic sampled flow data and actual sampled flow data from RedCAYLE, the Castilla y León regional subnet of the Spanish academic and research network. The results presented show that detection of malicious traffic on sampled flow data is possible using novelty-detection-based models with a high accuracy score and a low false alarm rate.