A Type-Aware Approach to Message Clustering for Protocol Reverse Engineering

Sensors (Basel). 2019 Feb 10;19(3):716. doi: 10.3390/s19030716.

Abstract

Protocol Reverse Engineering (PRE) is crucial for information security of Internet-of-Things (IoT), and message clustering determines the effectiveness of PRE. However, the quality of services still lags behind the strict requirement of IoT applications as the results of message clustering are often coarse-grained with the intrinsic type information hidden in messages largely ignored. Aiming at this problem, this study proposes a type-aware approach to message clustering guided by type information. The approach regards a message as a combination of n-grams, and it employs the Latent Dirichlet Allocation (LDA) model to characterize messages with types and n-grams via inferring the type distribution of each message. The type distribution is finally used to measure the similarity of messages. According to this similarity, the approach clusters messages and further extracts message formats. Experimental results of the approach against Netzob in terms of a number of protocols indicate that the correctness and conciseness can be significantly improved, e.g., figures 43.86% and 3.87%, respectively for the CoAP protocol.

Keywords: Internet of Things; information security; message clustering; protocol reverse engineering.