Easy identification of generalized common and conserved nested intervals

J Comput Biol. 2014 Jul;21(7):520-33. doi: 10.1089/cmb.2013.0146. Epub 2014 Mar 20.

Abstract

In this article we explain how to easily compute gene clusters, formalized by classical or generalized nested common or conserved intervals, between a set of K genomes represented as K permutations. A b-nested common (resp. conserved) interval I of size |I| is either an interval of size 1 or a common (resp. conserved) interval that contains another b-nested common (resp. conserved) interval of size at least |I|-b. When b=1, this corresponds to the classical notion of nested interval. We exhibit two simple algorithms to output all b-nested common or conserved intervals between K permutations in O(Kn+nocc) time, where nocc is the total number of such intervals. We also explain how to count all b-nested intervals in O(Kn) time. New properties of the family of conserved intervals are proposed to do so.

Keywords: algorithms; alignment; automata; combinatorics; strings.

MeSH terms

  • Algorithms*
  • Computational Biology / methods*
  • Genomics / methods*
  • Multigene Family*