ZeVigilante: Detecting Zero-Day Malware Using Machine Learning and Sandboxing Analysis Techniques

Fahd Alhaidari; Nouran Abu Shaib; Maram Alsafi; Haneen Alharbi; Majd Alawami; Reem Aljindan; Atta-Ur Rahman; Rachid Zagrouba

doi:10.1155/2022/1615528

ZeVigilante: Detecting Zero-Day Malware Using Machine Learning and Sandboxing Analysis Techniques

Comput Intell Neurosci. 2022 May 9:2022:1615528. doi: 10.1155/2022/1615528. eCollection 2022.

Authors

Fahd Alhaidari^{1

2}, Nouran Abu Shaib², Maram Alsafi², Haneen Alharbi², Majd Alawami², Reem Aljindan², Atta-Ur Rahman³, Rachid Zagrouba^{1

4}

Affiliations

¹ Saudi Aramco Cybersecurity Chair, Dhahran, Saudi Arabia.
² Department of Networks and Communications, College of Computer Science and Information Technology (CCSIT), Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia.
³ Department of Computer Science, College of Computer Science and Information Technology (CCSIT), Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia.
⁴ Department of Computer Information Systems, College of Computer Science and Information Technology (CCSIT), Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia.

Abstract

For the enormous growth and the hysterical impact of undocumented malicious software, otherwise known as Zero-Day malware, specialized practices were joined to implement systems capable of detecting these kinds of software to avert possible disastrous consequences. Owing to the nature of developed Zero-Day malware, distinct evasion tactics are used to remain stealth. Hence, there is a need for advance investigations of the methods that can identify such kind of malware. Machine learning (ML) is among the promising techniques for such type of predictions, while the sandbox provides a safe environment for such experiments. After thorough literature review, carefully chosen ML techniques are proposed for the malware detection, under Cuckoo sandboxing (CS) environment. The proposed system is coined as Zero-Day Vigilante (ZeVigilante) to detect the malware considering both static and dynamic analyses. We used adequate datasets for both analyses incorporating sufficient samples in contrast to other studies. Consequently, the processed datasets are used to train and test several ML classiﬁers including Random Forest (RF), Neural Networks (NN), Decision Tree (DT), k-Nearest Neighbor (kNN), Naïve Bayes (NB), and Support Vector Machine (SVM). It is observed that RF achieved the best accuracy for both static and dynamic analyses, 98.21% and 98.92%, respectively.

MeSH terms

Algorithms*
Bayes Theorem
Machine Learning*
Neural Networks, Computer
Software
Support Vector Machine