On Detecting Cryptojacking on Websites: Revisiting the Use of Classifiers

Fredy Andrés Aponte-Novoa; Daniel Povedano Álvarez; Ricardo Villanueva-Polanco; Ana Lucila Sandoval Orozco; Luis Javier García Villalba

doi:10.3390/s22239219

On Detecting Cryptojacking on Websites: Revisiting the Use of Classifiers

Sensors (Basel). 2022 Nov 27;22(23):9219. doi: 10.3390/s22239219.

Authors

Fredy Andrés Aponte-Novoa^{1

2}, Daniel Povedano Álvarez³, Ricardo Villanueva-Polanco¹, Ana Lucila Sandoval Orozco³, Luis Javier García Villalba³

Affiliations

¹ Department of Computer Science and Engineering, Universidad del Norte, Barranquilla 081007, Colombia.
² Department of Systems Engineering, Universidad Santo Tomás, Tunja 150003, Colombia.
³ Group of Analysis, Security and Systems (GASS), Department of Software Engineering and Artificial Intelligence (DISIA), Faculty of Computer Science and Engineering, Office 431, Universidad Complutense de Madrid (UCM), Calle Profesor José García Santesmases 9, Ciudad Universitaria, 28040 Madrid, Spain.

Abstract

Cryptojacking or illegal mining is a form of malware that hides in the victim's computer and takes the computational resources to extract cryptocurrencies in favor of the attacker. It generates significant computational consumption, reducing the computational efficiency of the victim's computer. This attack has increased due to the rise of cryptocurrencies and their profitability and its difficult detection by the user. The identification and blocking of this type of malware have become an aspect of research related to cryptocurrencies and blockchain technology; in the literature, some machine learning and deep learning techniques are presented, but they are still susceptible to improvement. In this work, we explore multiple Machine Learning classification models for detecting cryptojacking on websites, such as Logistic Regression, Decision Tree, Random Forest, Gradient Boosting Classifier, k-Nearest Neighbor, and XGBoost. To this end, we make use of a dataset, composed of network and host features' samples, to which we apply various feature selection methods such as those based on statistical methods, e.g., Test Anova, and other methods as Wrappers, not only to reduce the complexity of the built models but also to discover the features with the greatest predictive power. Our results suggest that simple models such as Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, and k-Nearest Neighbor models, can achieve success rate similar to or greater than that of advanced algorithms such as XGBoost and even those of other works based on Deep Learning.

Keywords: blockchain; cryptojacking; illegal mining; machine learning; malware.

MeSH terms

Algorithms*
Logistic Models
Machine Learning*

Grants and funding

779 de 2017/Colciencias