Cloud-based email phishing attack using machine and deep learning algorithm

Umer Ahmed Butt; Rashid Amin; Hamza Aldabbas; Senthilkumar Mohan; Bader Alouffi; Ali Ahmadian

doi:10.1007/s40747-022-00760-3

Cloud-based email phishing attack using machine and deep learning algorithm

Complex Intell Systems. 2023;9(3):3043-3070. doi: 10.1007/s40747-022-00760-3. Epub 2022 Jun 2.

Authors

Umer Ahmed Butt¹, Rashid Amin^{1

2}, Hamza Aldabbas³, Senthilkumar Mohan⁴, Bader Alouffi⁵, Ali Ahmadian⁶

Affiliations

¹ Department of Computer Science, University of Engineering and Technology, Taxila, Pakistan.
² Department of Computer Science, University of Chakwal, Chakwal, Pakistan.
³ Prince Abdullah Bin Ghazi Faculty of Information and Communication Technology, Al-Balqa Applied University, Al-Salt, Jordan.
⁴ School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, Tamilnadu 632014 India.
⁵ Department of Computer Science, College of Computers and Information Technology, Taif University, P.O. Box 11099, Taif, 21944 Saudi Arabia.
⁶ Department of Mathematics, Near East University, Nicosia, TRNC, Mersin 10 Turkey.

Abstract

Cloud computing refers to the on-demand availability of personal computer system assets, specifically data storage and processing power, without the client's input. Emails are commonly used to send and receive data for individuals or groups. Financial data, credit reports, and other sensitive data are often sent via the Internet. Phishing is a fraudster's technique used to get sensitive data from users by seeming to come from trusted sources. The sender can persuade you to give secret data by misdirecting in a phished email. The main problem is email phishing attacks while sending and receiving the email. The attacker sends spam data using email and receives your data when you open and read the email. In recent years, it has been a big problem for everyone. This paper uses different legitimate and phishing data sizes, detects new emails, and uses different features and algorithms for classification. A modified dataset is created after measuring the existing approaches. We created a feature extracted comma-separated values (CSV) file and label file, applied the support vector machine (SVM), Naive Bayes (NB), and long short-term memory (LSTM) algorithm. This experimentation considers the recognition of a phished email as a classification issue. According to the comparison and implementation, SVM, NB and LSTM performance is better and more accurate to detect email phishing attacks. The classification of email attacks using SVM, NB, and LSTM classifiers achieve the highest accuracy of 99.62%, 97% and 98%, respectively.

Keywords: Extract feature; Feature selection; Label data; Long short term memory (LSTM); Machine learning; Phishing dataset; Phishing detection; Support vector machine (SVM) classification; Text processing.