A Space-Bounded Anytime Algorithm for the Multiple Longest Common Subsequence Problem

IEEE Trans Knowl Data Eng. 2014 Nov;26(11):2599-2609. doi: 10.1109/TKDE.2014.2304464.

Abstract

The multiple longest common subsequence (MLCS) problem, related to the identification of sequence similarity, is an important problem in many fields. As an NP-hard problem, its exact algorithms have difficulty in handling large-scale data and time- and space-efficient algorithms are required in real-world applications. To deal with time constraints, anytime algorithms have been proposed to generate good solutions with a reasonable time. However, there exists little work on space-efficient MLCS algorithms. In this paper, we formulate the MLCS problem into a graph search problem and present two space-efficient anytime MLCS algorithms, SA-MLCS and SLA-MLCS. SA-MLCS uses an iterative beam widening search strategy to reduce space usage during the iterative process of finding better solutions. Based on SA-MLCS, SLA-MLCS, a space-bounded algorithm, is developed to avoid space usage from exceeding available memory. SLA-MLCS uses a replacing strategy when SA-MLCS reaches a given space bound. Experimental results show SA-MLCS and SLA-MLCS use an order of magnitude less space and time than the state-of-the-art approximate algorithm MLCS-APP while finding better solutions. Compared to the state-of-the-art anytime algorithm Pro-MLCS, SA-MLCS and SLA-MLCS can solve an order of magnitude larger size instances. Furthermore, SLA-MLCS can find much better solutions than SA-MLCS on large size instances.

Keywords: Heuristic search; anytime algorithm; multiple longest common subsequence (MLCS); space bounded.