Efficient Algorithms for Genomic Duplication Models

IEEE/ACM Trans Comput Biol Bioinform. 2018 Sep-Oct;15(5):1515-1524. doi: 10.1109/TCBB.2017.2706679. Epub 2017 May 23.

Abstract

An important issue in evolutionary molecular biology is to discover genomic duplication episodes and their correspondence to the species tree. Existing approaches vary in the two fundamental aspects: the choice of evolutionary scenarios that model allowed locations of duplications in the species tree, and the rules of clustering gene duplications from gene trees into a single multiple duplication event. Here we study the method of clustering called minimum episodes for several models of allowed evolutionary scenarios with a focus on interval models in which every gene duplication has an interval consisting of allowed locations in the species tree. We present mathematical foundations for general genomic duplication problems. Next, we propose the first linear time and space algorithm for minimum episodes clustering jointly for any interval model and the algorithm for the most general model in which every evolutionary scenario is allowed. We also present a comparative study of different models of genomic duplication based on simulated and empirical datasets. We provided algorithms and tools that could be applied to solve efficiently minimum episodes clustering problems. Our comparative study helps to identify which model is the most reasonable choice in inferring genomic duplication events.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Cluster Analysis
  • Computational Biology / methods*
  • Gene Duplication / genetics*
  • Genome / genetics*
  • Models, Genetic*
  • Phylogeny