On pattern matching with k mismatches and few don't cares

Inf Process Lett. 2017 Feb:118:78-82. doi: 10.1016/j.ipl.2016.10.003. Epub 2016 Oct 27.

Abstract

We consider the problem of pattern matching with k mismatches, where there can be don't care or wild card characters in the pattern. Specifically, given a pattern P of length m and a text T of length n, we want to find all occurrences of P in T that have no more than k mismatches. The pattern can have don't care characters, which match any character. Without don't cares, the best known algorithm for pattern matching with k mismatches has a runtime of [Formula: see text]. With don't cares in the pattern, the best deterministic algorithm has a runtime of O(nk polylog m). Therefore, there is an important gap between the versions with and without don't cares. In this paper we give an algorithm whose runtime increases with the number of don't cares. We define an island to be a maximal length substring of P that does not contain don't cares. Let q be the number of islands in P. We present an algorithm that runs in [Formula: see text] time. If the number of islands q is O(k) this runtime becomes [Formula: see text], which essentially matches the best known runtime for pattern matching with k mismatches without don't cares. If the number of islands q is O(k2), this algorithm is asymptotically faster than the previous best algorithm for pattern matching with k mismatches with don't cares in the pattern.

Keywords: k mismatches with don’t cares in the pattern; k mismatches with wild cards; pattern matching with k mismatches and don’t cares.