PMS6: a fast algorithm for motif discovery

Shibdas Bandyopadhyay; Sartaj Sahni; Sanguthevar Rajasekaran

doi:10.1504/IJBRA.2014.062990

PMS6: a fast algorithm for motif discovery

Int J Bioinform Res Appl. 2014;10(4-5):369-83. doi: 10.1504/IJBRA.2014.062990.

Authors

Shibdas Bandyopadhyay¹, Sartaj Sahni¹, Sanguthevar Rajasekaran²

Affiliations

¹ Department of CISE, University of Florida, Gainesville, FL 32611, USA.
² Department of CSE, University of Connecticut, Storrs, CT 06269, USA.

PMID: 24989858
DOI: 10.1504/IJBRA.2014.062990

Abstract

We propose a new algorithm, PMS6, for the (l,d)-motif discovery problem in which we are to find all strings of length l that appear in every string of a given set of strings with at most d mismatches. The run time ratio PMS5/PMS6, where PMS5 is the fastest previously known algorithm for motif discovery in large instances, ranges from a high of 2.20 for the (21,8) challenge instances to a low of 1.69 for the (17,6) challenge instances. Both PMS5 and PMS6 require some amount of pre-processing. The pre-processing time for PMS6 is 34 times faster than that for PMS5 for (23,9) instances. When pre-processing time is factored in, the run time ratio PMS5/PMS6 is as high as 2.75 for (13,4) instances and as low as 1.95 for (17,6) instances.

Keywords: PMS6; approximate string search; bioinformatics; motif discovery; motif search; planted motifs.

Publication types

Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms*
Amino Acid Motifs*
Computational Biology / methods*
Models, Statistical
Programming Languages
Sequence Analysis, Protein / methods
Software

Grants and funding

R01-LM010101/LM/NLM NIH HHS/United States