Genome Rearrangement Distance With a Flexible Intergenic Regions Aspect

IEEE/ACM Trans Comput Biol Bioinform. 2023 May-Jun;20(3):1641-1653. doi: 10.1109/TCBB.2022.3165443. Epub 2023 Jun 5.

Abstract

Most mathematical models for genome rearrangement problems have considered only gene order. In this way, the rearrangement distance considering some set of events, such as reversal and transposition events, is commonly defined as the minimum number of rearrangement events that transform the gene order from a genome G1 into the gene order from a genome G2. Recent works initiate incorporating more information such as the sizes of the intergenic regions (i.e., number of nucleotides between pairs of consecutive genes), which yields good results for estimated distances on real data. In these models, besides transforming the gene order, the sequence of rearrangement events must transform the list of intergenic regions sizes from G1 into the list of intergenic regions sizes from G2 (target list). We study a new variation where the target list is flexible, in the sense that each target intergenic region size is in a range of acceptable values. This allows us to model scenarios where the main objective is still to transform the order of genes from the source genome into the target genome, allowing flexibility in the sizes of the intergenic regions, since the nucleotides in these regions tend to undergo more changes when compared to genes. We investigate the rearrangement distance considering three sets of events, two with the exclusive use of reversals or transpositions, and the other allowing both rearrangement events. We present approximation algorithms for the problems and an NP-hardness proof. Our results rely on the Flexible Weighted Cycle Graph, adapted from the breakpoint graph to deal with flexible intergenic regions sizes.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Gene Rearrangement* / genetics
  • Genome
  • Genomics* / methods
  • Models, Genetic
  • Nucleotides

Substances

  • Nucleotides