Heuristics for Genome Rearrangement Distance With Replicated Genes

IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2094-2108. doi: 10.1109/TCBB.2021.3095021. Epub 2021 Dec 8.

Abstract

In comparative genomics, one goal is to find similarities between genomes of different organisms. Comparisons using genome features like genes, gene order, and regulatory sequences are carried out with this purpose in mind. Genome rearrangements are mutational events that affect large extensions of the genome. They are responsible for creating extant species with conserved genes in different positions across genomes. Close species - from an evolutionary point of view - tend to have the same set of genes or share most of them. When we consider gene order to compare two genomes, it is possible to use a parsimony criterion to estimate how close the species are. We are interested in the shortest sequence of genome rearrangements capable of transforming one genome into the other, which is named rearrangement distance. Reversal is one of the most studied genome rearrangements events. This event acts in a segment of the genome, inverting the position and the orientation of genes in it. Transposition is another widely studied event. This event swaps the position of two consecutive segments of the genome. When the genome has no gene repetition, a common approach is to map it as a permutation such that each element represents a conserved block. When genomes have replicated genes, this mapping is usually performed using strings. The number of replicas depends on the organisms being compared, but in many scenarios, it tends to be small. In this work, we study the rearrangement distance between genomes with replicated genes considering that the orientation of genes is unknown. We present four heuristics for the problem of genome rearrangement distance with replicated genes. We carry out experiments considering the exclusive use of the reversals or transpositions events, as well as the version in which both events are allowed. We developed a database of simulated genomes and compared our results with other algorithms from the literature. The experiments showed that our heuristics with more sophisticated rules presented a better performance than the known algorithms to estimate the evolutionary distance between genomes with replicated genes. In order to validate the application of our algorithms in real data, we construct a phylogenetic tree based on the distance provided by our algorithm and compare it with a know tree from the literature.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Gene Dosage / genetics
  • Gene Rearrangement / genetics*
  • Genomics / methods*
  • Heuristics