Using sequence compression to speedup probabilistic profile matching

Bioinformatics. 2005 May 15;21(10):2225-9. doi: 10.1093/bioinformatics/bti323. Epub 2005 Feb 15.

Abstract

Motivation: Matching a biological sequence against a probabilistic pattern (or profile) is a common task in computational biology. A probabilistic profile, represented as a scoring matrix, is more suitable than a deterministic pattern to retain the peculiarities of a given segment of a family of biological sequences. Brute-force algorithms take O(NP) to match a sequence of N characters against a profile of length P << N.

Results: In this work, we exploit string compression techniques to speedup brute-force profile matching. We present two algorithms, based on run-length and LZ78 encodings, that reduce computational complexity by the compression factor of the encoding.

Publication types

  • Comparative Study
  • Evaluation Study

MeSH terms

  • Algorithms*
  • Data Compression / methods*
  • Gene Expression Profiling / methods*
  • Models, Chemical*
  • Models, Statistical*
  • Sequence Alignment / methods*
  • Sequence Analysis / methods*
  • Sequence Homology