TagSeq: Malicious behavior discovery using dynamic analysis

PLoS One. 2022 May 16;17(5):e0263644. doi: 10.1371/journal.pone.0263644. eCollection 2022.

Abstract

In recent years, studies on malware analysis have noticeably increased in the cybersecurity community. Most recent studies concentrate on malware classification and detection or malicious patterns identification, but as to malware activity, it still relies heavily on manual analysis for high-level semantic descriptions. We develop a sequence-to-sequence (seq2seq) neural network, called TagSeq, to investigate a sequence of Windows API calls recorded from malware execution, and produce tags to label their malicious behavior. We propose embedding modules to transform Windows API function parameters, registry, filenames, and URLs into low-dimension vectors, while still preserving the closeness property. Moreover, we utilize an attention mechanism to capture the relations between generated tags and certain API invocation calls. Results show that the most possible malicious actions are identified by TagSeq. Examples and a case study demonstrate that the proposed embedding modules preserve semantic-physical relations and that the predicted tags reflect malicious intentions. We believe this work is suitable as a tool to help security analysts recognize malicious behavior and intent with easy-to-understand tags.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computer Security*
  • Neural Networks, Computer*
  • Records
  • Registries
  • Semantics

Grants and funding

This study was supported by Ministry of Science and Technology, R.O.C. in the form of grants awarded to M.C.C. (109-2221-E-001-010-MY3) and Y.S.S. (108-2218-E-002-045). No additional external funding was received for this study.