A multi-pattern hash-binary hybrid algorithm for URL matching in the HTTP protocol

PLoS One. 2017 Apr 11;12(4):e0175500. doi: 10.1371/journal.pone.0175500. eCollection 2017.

Abstract

In this paper, based on our previous multi-pattern uniform resource locator (URL) binary-matching algorithm called HEM, we propose an improved multi-pattern matching algorithm called MH that is based on hash tables and binary tables. The MH algorithm can be applied to the fields of network security, data analysis, load balancing, cloud robotic communications, and so on-all of which require string matching from a fixed starting position. Our approach effectively solves the performance problems of the classical multi-pattern matching algorithms. This paper explores ways to improve string matching performance under the HTTP protocol by using a hash method combined with a binary method that transforms the symbol-space matching problem into a digital-space numerical-size comparison and hashing problem. The MH approach has a fast matching speed, requires little memory, performs better than both the classical algorithms and HEM for matching fields in an HTTP stream, and it has great promise for use in real-world applications.

MeSH terms

  • Algorithms*
  • Internet*
  • Programming Languages*

Grants and funding

This work is supported by Scientific Research Fund of the Hunan Provincial Education Department (No. 15A007) and National Natural Science Foundation of China (No. 61202116). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.