Muscle5: High-accuracy alignment ensembles enable unbiased assessments of sequence homology and phylogeny

Nat Commun. 2022 Nov 15;13(1):6968. doi: 10.1038/s41467-022-34630-w.

Abstract

Multiple sequence alignments are widely used to infer evolutionary relationships, enabling inferences of structure, function, and phylogeny. Standard practice is to construct one alignment by some preferred method and use it in further analysis; however, undetected alignment bias can be problematic. I describe Muscle5, a novel algorithm which constructs an ensemble of high-accuracy alignment with diverse biases by perturbing a hidden Markov model and permuting its guide tree. Confidence in an inference is assessed as the fraction of the ensemble which supports it. Applied to phylogenetic tree estimation, I show that ensembles can confidently resolve topologies with low bootstrap according to standard methods, and conversely that some topologies with high bootstraps are incorrect. Applied to the phylogeny of RNA viruses, ensemble analysis shows that recently adopted taxonomic phyla are probably polyphyletic. Ensemble analysis can improve confidence assessment in any inference from an alignment.

MeSH terms

  • Algorithms*
  • Biological Evolution*
  • Phylogeny
  • Sequence Alignment
  • Sequence Homology