Malicious traffic detection on sampled network flow data with novelty-detection-based models

Adrián Campazas-Vega; Ignacio Samuel Crespo-Martínez; Ángel Manuel Guerrero-Higueras; Claudia Álvarez-Aparicio; Vicente Matellán; Camino Fernández-Llamas

doi:10.1038/s41598-023-42618-9

Malicious traffic detection on sampled network flow data with novelty-detection-based models

Sci Rep. 2023 Sep 18;13(1):15446. doi: 10.1038/s41598-023-42618-9.

Authors

Adrián Campazas-Vega¹, Ignacio Samuel Crespo-Martínez², Ángel Manuel Guerrero-Higueras³, Claudia Álvarez-Aparicio³, Vicente Matellán², Camino Fernández-Llamas³

Affiliations

¹ Robotics Group, University of León, Campus de Vegazana s/n, 24071, León, Spain. acamv@unileon.es.
² Supercomputación Castilla y León (SCAYLE), Campus de Vegazana s/n, 24071, León, Spain.
³ Robotics Group, University of León, Campus de Vegazana s/n, 24071, León, Spain.

Abstract

Cyber-attacks are a major problem for users, businesses, and institutions. Classical anomaly detection techniques can detect malicious traffic generated in a cyber-attack by analyzing individual network packets. However, routers that manage large traffic loads can only examine some packets. These devices often use lightweight flow-based protocols to collect network statistics. Analyzing flow data also allows for detecting malicious network traffic. But even gathering flow data has a high computational cost, so routers usually apply a sampling rate to generate flows. This sampling reduces the computational load on routers, but much information is lost. This work aims to demonstrate that malicious traffic can be detected even on flow data collected with a sampling rate of 1 out of 1,000 packets. To do so, we evaluate anomaly-detection-based models using synthetic sampled flow data and actual sampled flow data from RedCAYLE, the Castilla y León regional subnet of the Spanish academic and research network. The results presented show that detection of malicious traffic on sampled flow data is possible using novelty-detection-based models with a high accuracy score and a low false alarm rate.

Abstract

Grants and funding