Android malware detection method based on highly distinguishable static features and DenseNet

PLoS One. 2022 Nov 23;17(11):e0276332. doi: 10.1371/journal.pone.0276332. eCollection 2022.

Abstract

The rapid growth of malware has become a serious problem that threatens the security of the mobile ecosystem and needs to be studied and resolved. Android is the main target of attackers due to its open source and popularity. To solve this serious problem, an accurate and efficient malware detection method is needed. Most existing methods use a single type of feature, which can be easily bypassed, resulting in low detection accuracy. In addition, although multiple types of features are used in some methods to solve the drawbacks of detection methods using a single type of feature, there are still some problems. Firstly, due to multiple types of features, the number of features in the initial feature set is extremely large, and some methods directly use them for training, resulting in excessive overhead. Furthermore, some methods utilize feature selection to reduce the dimensionality of features, but they do not select highly distinguishable features, resulting in poor detection performance. In this article, an effective and accurate method for identifying Android malware, which is based on an analysis of the use of seven types of static features in Android is proposed to cope with the rapid increase in the amount of Android malware and overcome the drawbacks of detection methods using a single type of feature. Instead of utilizing all extracted features, we design three levels of feature selection methods to obtain highly distinguishable features that can be effective in identifying malware. Then a fully densely connected convolutional network based on DenseNet is adopted to leverage features more efficiently and effectively for malware detection. Compared with the number of features in the original feature set, the number of features in the feature set obtained by the three levels of feature selection methods is reduced by about 97%, but the accuracy is only reduced by 0.45%, and the accuracy is more than 99% in a variety of machine learning methods. Moreover, we compare our detection method with different machine learning models, and the experimental results show that our method outperforms general machine learning models. We also compare the performance of our detection method with two state-of-the-art neural networks. The experimental results show that our detection model can greatly reduce the training cost and still achieve good detection performance, reaching an accuracy of 99.72%. In addition, we compare our detection method with other similar detection methods that also use multiple types of features. The results show that our detection method is superior to the comparison methods.

MeSH terms

  • Adaptation, Psychological*
  • Cost of Illness
  • Ecosystem*
  • Machine Learning
  • Neural Networks, Computer

Grants and funding

The author(s) received no specific funding for this work.