Reconstructing Gene Gains and Losses with BadiRate

Methods Mol Biol. 2022:2569:213-232. doi: 10.1007/978-1-0716-2691-7_10.

Abstract

Estimating gene gain and losses is paramount to understand the molecular mechanisms underlying adaptive evolution. Despite the advent of high-throughput sequencing, such analyses have been so far hampered by the poor contiguity of genome assemblies. The increasing affordability of long-read sequencing technologies will however revolutionize our capacity to identify gene gains and losses at an unprecedented resolution, even in non-model organisms. To thoroughly exploit all such multigene family variation, the software BadiRate implements a collection of birth-and-death stochastic models, aiming at estimating by maximum likelihood the gene turnover rates along the internal and external branches of a given phylogenetic species tree. Its statistical framework also provides versatility for inferring the gene family content at the internal phylogenetic nodes (and to estimate the minimum number of gene gains and losses in each branch), for statistically contrasting competing hypotheses (e.g., accelerations of the gene turnover rates at pre-defined clades), and for pinpointing gene family expansions or contractions likely driven by natural selection. In this chapter we review the theoretical models implemented in BadiRate and illustrate their applicability by analyzing a hypothetical data set of 14 microbial species.

Keywords: Bioinformatics; Birth and death model; Gene duplication; Gene family; Gene gains; Gene losses; Gene turnover rates.

Publication types

  • Review
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Evolution, Molecular*
  • Gene Duplication
  • Multigene Family
  • Phylogeny
  • Software*