Approaching the One-Sided Exemplar Adjacency Number Problem

IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):1946-1954. doi: 10.1109/TCBB.2019.2913834. Epub 2020 Dec 8.

Abstract

The one-sided Exemplar Adjacency Number (EAN) is a known problem for computing the exemplar similarity between a generic linear genome G with gene duplications and an exemplar genome H (over the same set of n gene families). In this problem, we need to compute an exemplar genome G, which is a permutation obtained from G, such that the number of common adjacencies between G and H is maximized. Unfortunately, the problem is not only NP-hard but also NP-hard to approximate. In this paper, we approach the problem by relaxing the constraint such that a sub-permutation G+ obtained from G does not have to include all the gene families, but still needs to have a length at least k. Hence G+ is called a pseudo-exemplar genome. Then, a slightly more general problem (One-sided EAN+) is defined: compute a pseudo-exemplar genome G+ from G such that the number of common adjacencies between H and G+ is maximized. Certainly One-sided EAN+ contains One-sided EAN as a special case; moreover, it presents some flexibility in designing algorithms. First, we relax and formulate the One-sided EAN+ problem as the maximum independent set (MIS) on a colored interval graph and hence reduce the appearance of each gene to at most two times. We show that this new relaxation is still NP-complete, though a simple factor-2 approximation algorithm can be designed; moreover, we also prove that the problem cannot be approximated within 2-ε by a local search technique. We then show that this relaxed version is fixed-parameter tractable (FPT). Second, to ensure that each gene appears in G+ at most once, we use integer linear programming (ILP) to solve this problem. Finally, we implement our algorithm and compare it with the up-to-date software GREDU, with simulated signed and unsigned genomes. It turns out that our algorithm is more stable and can process genomes of length up to 12,000 for signed genomes (while GREDU can falter on such a large signed genome and it cannot handle unsigned genomes at all).

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Computer Simulation
  • Gene Duplication / genetics
  • Genome / genetics*
  • Genomics / methods*
  • Programming, Linear
  • Software