A detailed study of resampling algorithms for cyberattack classification in engineering applications

Óscar Mogollón Gutiérrez; José Carlos Sancho Núñez; Mar Ávila; Andrés Caro

doi:10.7717/peerj-cs.1975

A detailed study of resampling algorithms for cyberattack classification in engineering applications

PeerJ Comput Sci. 2024 Apr 15:10:e1975. doi: 10.7717/peerj-cs.1975. eCollection 2024.

Authors

Óscar Mogollón Gutiérrez¹, José Carlos Sancho Núñez¹, Mar Ávila¹, Andrés Caro¹

Affiliation

¹ Escuela Politecnica, University of Extremadura, Cáceres, Cáceres, Spain.

Abstract

The evolution of engineering applications is highly relevant in the context of protecting industrial systems. As industries are increasingly interconnected, the need for robust cybersecurity measures becomes paramount. Engineering informatics not only provides tools for knowledge representation and extraction but also affords a comprehensive spectrum of developing sophisticated cybersecurity solutions. However, safeguarding industrial systems poses a unique challenge due to the inherent heterogeneity of data within these environments. Together with this problem, it's crucial to acknowledge that datasets that simulate real cyberattacks within these diverse environments exhibit a high imbalance, often skewed towards certain types of traffics. This study proposes a system for addressing class imbalance in cybersecurity. To do this, three oversampling (SMOTE, Borderline1-SMOTE, and ADASYN) and five undersampling (random undersampling, cluster centroids, NearMiss, repeated edited nearest neighbor, and Tomek Links) methods are tested. Particularly, these balancing algorithms are used to generate one-vs-rest binary models and to develop a two-stage classification system. By doing so, this study aims to enhance the efficacy of cybersecurity measures ensuring a more comprehensive understanding and defense against the diverse range of threats encountered in industrial environments. Experimental results demonstrates the effectiveness of proposed system for cyberattack detection and classification among nine widely known cyberattacks.

Keywords: Attack classification; Cyber-physical systems; Imbalanced learning; Intrusion detection; UNSW-NB15.

Grants and funding

This initiative is carried out within the framework of the funds of the Recovery, Transformation and Resilience Plan, financed by the European Union (Next Generation). The publication is part of the Spanish Strategic Cybersecurity Project “Detection of Identity Document Forgery using Computer Vision and Artificial Intelligence Techniques (C108/23)” funded by Instituto Nacional de Ciberseguridad de España (INCIBE). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.