Influence of substitution model selection on protein phylogenetic tree reconstruction

Gene. 2023 May 20:865:147336. doi: 10.1016/j.gene.2023.147336. Epub 2023 Mar 3.

Abstract

Probabilistic phylogenetic tree reconstruction is traditionally performed under a best-fitting substitution model of molecular evolution previously selected according to diverse statistical criteria. Interestingly, some recent studies proposed that this procedure is unnecessary for phylogenetic tree reconstruction leading to a debate in the field. In contrast to DNA sequences, phylogenetic tree reconstruction from protein sequences is traditionally based on empirical exchangeability matrices that can differ among taxonomic groups and protein families. Considering this aspect, here we investigated the influence of selecting a substitution model of protein evolution on phylogenetic tree reconstruction by the analyses of real and simulated data. We found that phylogenetic tree reconstructions based on a selected best-fitting substitution model of protein evolution are the most accurate, in terms of topology and branch lengths, compared with those derived from substitution models with amino acid replacement matrices far from the selected best-fitting model, especially when the data has large genetic diversity. Indeed, we found that substitution models with similar amino acid replacement matrices produce similar reconstructed phylogenetic trees, suggesting the use of substitution models as similar as possible to a selected best-fitting model when the latter cannot be used. Therefore, we recommend the use of the traditional protocol of selection among substitution models of evolution for protein phylogenetic tree reconstruction.

Keywords: Molecular evolution; Phylogenetic tree reconstruction; Phylogenetics; Protein evolution; Substitution model selection; Substitution models of protein evolution.

MeSH terms

  • Amino Acids*
  • Base Sequence
  • Evolution, Molecular*
  • Models, Genetic
  • Phylogeny

Substances

  • Amino Acids