PermDroid a framework developed using proposed feature selection approach and machine learning techniques for Android malware detection

Arvind Mahindru; Himani Arora; Abhinav Kumar; Sachin Kumar Gupta; Shubham Mahajan; Seifedine Kadry; Jungeun Kim

doi:10.1038/s41598-024-60982-y

PermDroid a framework developed using proposed feature selection approach and machine learning techniques for Android malware detection

Sci Rep. 2024 May 10;14(1):10724. doi: 10.1038/s41598-024-60982-y.

Authors

Arvind Mahindru¹, Himani Arora², Abhinav Kumar³, Sachin Kumar Gupta^{4

5}, Shubham Mahajan⁶, Seifedine Kadry^{7

8

9

10}, Jungeun Kim¹¹

Affiliations

¹ Department of Computer Science and applications, D.A.V. University, Sarmastpur, Jalandhar, 144012, India. er.arvindmahindru@gmail.com.
² Department of Mathematics, Guru Nanak Dev University, Amritsar, India.
³ Department of Nuclear and Renewable Energy, Ural Federal University Named after the First President of Russia Boris Yeltsin, Ekaterinburg, Russia, 620002.
⁴ Department of Electronics and Communication Engineering, Central University of Jammu, Jammu, 181143, UT of J&K, India. sachin.ece@cujammu.ac.in.
⁵ School of Electronics and Communication Engineering, Shri Mata Vaishno Devi University, Katra, 182320, UT of J&K, India. sachin.ece@cujammu.ac.in.
⁶ Department of Applied Data Science, Noroff University College, Kristiansand, Norway. mahajanshubham2232579@gmail.com.
⁷ Department of Applied Data Science, Noroff University College, Kristiansand, Norway.
⁸ Artificial Intelligence Research Center (AIRC), Ajman University, Ajman, 346, United Arab Emirates.
⁹ MEU Research Unit, Middle East University, Amman 11831, Jordan.
¹⁰ Applied Science Research Center, Applied Science Private University, Amman, Jordan.
¹¹ Department of Software, Department of Computer Science and Engineering, Kongju National University, Cheonan, 31080, Korea. jekim@kongju.ac.kr.

PMID: 38730228
DOI: 10.1038/s41598-024-60982-y

Abstract

The challenge of developing an Android malware detection framework that can identify malware in real-world apps is difficult for academicians and researchers. The vulnerability lies in the permission model of Android. Therefore, it has attracted the attention of various researchers to develop an Android malware detection model using permission or a set of permissions. Academicians and researchers have used all extracted features in previous studies, resulting in overburdening while creating malware detection models. But, the effectiveness of the machine learning model depends on the relevant features, which help in reducing the value of misclassification errors and have excellent discriminative power. A feature selection framework is proposed in this research paper that helps in selecting the relevant features. In the first stage of the proposed framework, t-test, and univariate logistic regression are implemented on our collected feature data set to classify their capacity for detecting malware. Multivariate linear regression stepwise forward selection and correlation analysis are implemented in the second stage to evaluate the correctness of the features selected in the first stage. Furthermore, the resulting features are used as input in the development of malware detection models using three ensemble methods and a neural network with six different machine-learning algorithms. The developed models' performance is compared using two performance parameters: F-measure and Accuracy. The experiment is performed by using half a million different Android apps. The empirical findings reveal that malware detection model developed using features selected by implementing proposed feature selection framework achieved higher detection rate as compared to the model developed using all extracted features data set. Further, when compared to previously developed frameworks or methodologies, the experimental results indicates that model developed in this study achieved an accuracy of 98.8%.

Keywords: API calls; Android apps; Deep learning; Feature selection; Intrusion detection; Neural network; Permissions model.