Generalizing the Domain-Gene-Species Reconciliation Framework to Microbial Genes and Domains

IEEE/ACM Trans Comput Biol Bioinform. 2023 Nov-Dec;20(6):3511-3522. doi: 10.1109/TCBB.2023.3294480. Epub 2023 Dec 25.

Abstract

Protein domains play an important role in the function and evolution of many gene families. Previous studies have shown that domains are frequently lost or gained during gene family evolution. Yet, most computational approaches for studying gene family evolution do not account for domain-level evolution within genes. To address this limitation, a new three-level reconciliation framework, called the Domain-Gene-Species (DGS) reconciliation model, has been recently developed to simultaneously model the evolution of a domain family inside one or more gene families and the evolution of those gene families inside a species tree. However, the existing model applies only to multi-cellular eukaryotes where horizontal gene transfer is negligible. In this work, we generalize the existing DGS reconciliation model by allowing for the spread of genes and domains across species boundaries through horizontal transfer. We show that the problem of computing optimal generalized DGS reconciliations, though NP-hard, is approximable to within a constant factor, where the specific approximation ratio depends on the "event costs" used. We provide two different approximation algorithms for the problem and demonstrate the impact of the generalized framework using both simulated and real biological data. Our results show that our new algorithms result in highly accurate reconstructions of domain family evolution for microbes.

MeSH terms

  • Algorithms
  • Evolution, Molecular*
  • Gene Duplication*
  • Gene Transfer, Horizontal / genetics
  • Genes, Microbial
  • Models, Genetic
  • Phylogeny