Phishing URLs Detection Using Sequential and Parallel ML Techniques: Comparative Analysis

Naya Nagy; Malak Aljabri; Afrah Shaahid; Amnah Albin Ahmed; Fatima Alnasser; Linda Almakramy; Manar Alhadab; Shahad Alfaddagh

doi:10.3390/s23073467

Phishing URLs Detection Using Sequential and Parallel ML Techniques: Comparative Analysis

Sensors (Basel). 2023 Mar 26;23(7):3467. doi: 10.3390/s23073467.

Authors

Naya Nagy¹, Malak Aljabri², Afrah Shaahid³, Amnah Albin Ahmed³, Fatima Alnasser³, Linda Almakramy³, Manar Alhadab³, Shahad Alfaddagh³

Affiliations

¹ SAUDI ARAMCO Cybersecurity Chair, Department of Networks and Communication, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia.
² Department of Computer Science, College of Computers and Information Systems, Umm Al-Qura University, Makkah 21955, Saudi Arabia.
³ SAUDI ARAMCO Cybersecurity Chair, Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia.

Abstract

In today's digitalized era, the world wide web services are a vital aspect of each individual's daily life and are accessible to the users via uniform resource locators (URLs). Cybercriminals constantly adapt to new security technologies and use URLs to exploit vulnerabilities for illicit benefits such as stealing users' personal and sensitive data, which can lead to financial loss, discredit, ransomware, or the spread of malicious infections and catastrophic cyber-attacks such as phishing attacks. Phishing attacks are being recognized as the leading source of data breaches and the most prevalent deceitful scam of cyber-attacks. Artificial intelligence (AI)-based techniques such as machine learning (ML) and deep learning (DL) have proven to be infallible in detecting phishing attacks. Nevertheless, sequential ML can be time intensive and not highly efficient in real-time detection. It can also be incapable of handling vast amounts of data. However, utilizing parallel computing techniques in ML can help build precise, robust, and effective models for detecting phishing attacks with less computation time. Therefore, in this proposed study, we utilized various multiprocessing and multithreading techniques in Python to train ML and DL models. The dataset used comprised 54 K records for training and 12 K for testing. Five experiments were carried out, the first one based on sequential execution followed by the next four based on parallel execution techniques (threading using Python parallel backend, threading using Python parallel backend and number of jobs, threading manually, and multiprocessing using Python parallel backend). Four models, namely, random forest (RF), naïve bayes (NB), convolutional neural network (CNN), and long short-term memory (LSTM) were deployed to carry out the experiments. Overall, the experiments yielded excellent results and speedup. Lastly, to consolidate, a comprehensive comparative analysis was performed.

Keywords: cyber-attacks; deep learning; machine learning; parallel processing; phishing attacks.

Grants and funding

Aramco cybersecurity chair in IAU