We propose a new algorithm, PMS6, for the (l,d)-motif discovery problem in which we are to find all strings of length l that appear in every string of a given set of strings with at most d mismatches. The run time ratio PMS5/PMS6, where PMS5 is the fastest previously known algorithm for motif discovery in large instances, ranges from a high of 2.20 for the (21,8) challenge instances to a low of 1.69 for the (17,6) challenge instances. Both PMS5 and PMS6 require some amount of pre-processing. The pre-processing time for PMS6 is 34 times faster than that for PMS5 for (23,9) instances. When pre-processing time is factored in, the run time ratio PMS5/PMS6 is as high as 2.75 for (13,4) instances and as low as 1.95 for (17,6) instances.
Keywords: PMS6; approximate string search; bioinformatics; motif discovery; motif search; planted motifs.