Reversal and Indel Distance With Intergenic Region Information

IEEE/ACM Trans Comput Biol Bioinform. 2023 May-Jun;20(3):1628-1640. doi: 10.1109/TCBB.2022.3215615. Epub 2023 Jun 5.

Abstract

Recent works on genome rearrangements have shown that incorporating intergenic region information along with gene order in models provides better estimations for the rearrangement distance than using gene order alone. The reversal distance is one of the main problems in genome rearrangements. It has a polynomial time algorithm when only gene order is used to model genomes, assuming that repeated genes do not exist and that gene orientation is known, even when the genomes have distinct gene sets. The reversal distance is NP-hard and has a 2-approximation algorithm when incorporating intergenic regions. However, the problem has only been studied assuming genomes with the same set of genes. In this work, we consider the variation that incorporates intergenic regions and that allows genomes to have distinct sets of genes, a scenario that leads us to include indels operations (insertions and deletions). We present a 2.5-approximation algorithm using the labeled intergenic breakpoint graph, which is based on the well-known breakpoint graph structure. We also present an experimental analysis of the proposed algorithm using simulated data, which showed that the practical approximation factor is considerably less than 2.5. Furthermore, we used the algorithm in real genomes to construct a phylogenetic tree.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Gene Rearrangement
  • Genome*
  • INDEL Mutation / genetics
  • Models, Genetic*
  • Phylogeny