Minimizing genomic duplication episodes

Comput Biol Chem. 2020 Dec:89:107260. doi: 10.1016/j.compbiolchem.2020.107260. Epub 2020 Apr 26.

Abstract

Background: The genomic duplication study is fundamental to understand the process of evolution. In evolutionary molecular biology, many approaches focus on discovering the occurrences of gene duplications and multiple gene duplication episodes and their locations in the Tree of Life. To reconstruct such episodes, one can cluster single gene duplications inferred by reconciling a set of gene trees with a species tree.

Results: We propose an efficient quadratic time algorithm to solve the problem of genomic duplication clustering, in which input gene trees are rooted, episode locations are restricted to preserve the minimal number of single gene duplications, clustering rules are described by minimum episodes method, and the goal is based on the recently introduced new approach to minimize the maximal number of duplication episodes on a single path, called here the MP score. Based on our theoretical results, we show new algorithmic relationships between the MP score and the minimum episodes (ME) score, defined as the minimal number of duplication episodes.

Conclusions: Our evaluation analysis on three empirical datasets demonstrates, that under the model in which the minimal number of duplications is preserved, the duplication clusterings with minimal MP score support the clusterings with the minimal total number of duplication episodes.

Availability: The software is available at https://bitbucket.org/pgor17/rmp.

Keywords: Duplication episode; Genomic duplication; Maximal path; Minimum episodes problem; Reconciliation; Species tree.

MeSH terms

  • Algorithms*
  • Databases, Genetic / statistics & numerical data
  • Evolution, Molecular
  • Gene Duplication*
  • Models, Genetic*