Offset-FA: A Uniform Method to Handle Both Unbounded and Bounded Repetitions in Regular Expression Matching

Sensors (Basel). 2022 Oct 13;22(20):7781. doi: 10.3390/s22207781.

Abstract

With the exponential growth of cyber-physical systems (CPSs), security challenges have emerged; attacks on critical infrastructure could result in catastrophic consequences. Intrusion detection is the foundation for CPS security protection, and deep-packet inspection is the primary method for signature-matched mechanisms. This method usually employs regular expression matching (REM) to detect possible threats in the packet payload. State explosion is the critical challenge for REM applications, which originates primarily from features of large character sets with unbounded (closures) or bounded (counting) repetitions. In this work, we propose Offset-FA to handle these repetitions in a uniform mechanism. Offset-FA eliminates state explosion by extracting the repetitions from the nonexplosive string fragments. Then, these fragments are compiled into a fragment-DFA, while a fragment relation table and a reset table are constructed to preserve their connection and offset relationship. To our knowledge, Offset-FA is the first automaton to handle these two kinds of repetitions together with a uniform mechanism. Experiments demonstrate that Offset-FA outperforms state-of-the-art solutions in both space cost and matching speed on the premise of matching correctness, and achieves a comparable matching speed with that of DFA on practical rule sets.

Keywords: CPS security; deep-packet inspection; regular expression matching; state explosion.