Taming the Duplication-Loss-Coalescence Model with Integer Linear Programming

J Comput Biol. 2021 Aug;28(8):758-773. doi: 10.1089/cmb.2021.0011. Epub 2021 Apr 16.

Abstract

The duplication-loss-coalescence (DLC) parsimony model is invaluable for analyzing the complex scenarios of concurrent duplication loss and deep coalescence events in the evolution of gene families. However, inferring such scenarios for already moderately sized families is prohibitive owing to the computational complexity involved. To overcome this stringent limitation, we make the first step by describing a flexible integer linear programming (ILP) formulation for inferring DLC evolutionary scenarios. Then, to make the DLC model more scalable, we introduce four sensibly constrained versions of the model and describe modified versions of our ILP formulation reflecting these constraints. Our simulation studies showcase that our constrained ILP formulations compute evolutionary scenarios that are substantially larger than scenarios computable under our original ILP formulation and the original dynamic programming algorithm by Wu et al. Furthermore, scenarios computed under our constrained DLC models are remarkably accurate compared with corresponding scenarios under the original DLC model, which we also confirm in an empirical study with thousands of gene families.

Keywords: DLC; ILP; coalescence; duplications; losses; phylogenetics; reconciliation.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Evolution, Molecular
  • Gene Duplication
  • Models, Genetic
  • Multigene Family*
  • Phylogeny
  • Programming, Linear