An Android Malware Detection Approach to Enhance Node Feature Differences in a Function Call Graph Based on GCNs

Haojie Wu; Nurbol Luktarhan; Gaoqi Tian; Yangyang Song

doi:10.3390/s23104729

An Android Malware Detection Approach to Enhance Node Feature Differences in a Function Call Graph Based on GCNs

Sensors (Basel). 2023 May 13;23(10):4729. doi: 10.3390/s23104729.

Authors

Haojie Wu¹, Nurbol Luktarhan², Gaoqi Tian¹, Yangyang Song²

Affiliations

¹ School of Software, Xinjiang University, Urumqi 830091, China.
² College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China.

Abstract

The smartphone has become an indispensable tool in our daily lives, and the Android operating system is widely installed on our smartphones. This makes Android smartphones a prime target for malware. In order to address threats posed by malware, many researchers have proposed different malware detection approaches, including using a function call graph (FCG). Although an FCG can capture the complete call-callee semantic relationship of a function, it will be represented as a huge graph structure. The presence of many nonsensical nodes affects the detection efficiency. At the same time, the characteristics of the graph neural networks (GNNs) make the important node features in the FCG tend toward similar nonsensical node features during the propagation process. In our work, we propose an Android malware detection approach to enhance node feature differences in an FCG. Firstly, we propose an API-based node feature by which we can visually analyze the behavioral properties of different functions in the app and determine whether their behavior is benign or malicious. Then, we extract the FCG and the features of each function from the decompiled APK file. Next, we calculate the API coefficient inspired by the idea of the TF-IDF algorithm and extract the sensitive function called subgraph (S-FCSG) based on API coefficient ranking. Finally, before feeding the S-FCSG and node features into the GCN model, we add the self-loop for each node of the S-FCSG. A 1-D convolutional neural network and fully connected layers are used for further feature extraction and classification, respectively. The experimental result shows that our approach enhances the node feature differences in an FCG, and the detection accuracy is greater than that of models using other features, suggesting that malware detection based on a graph structure and GNNs has a lot of space for future study.

Keywords: Android malware detection; TF–IDF; function call graph; graph convolutional network; self-loop.

Grants and funding

20&ZD293/The National Social Science Fund of China