How Often Does Filtering of Alignment Columns Improve the Phylogenetic Inference of Two-Domain Proteins?

Biochemistry (Mosc). 2022 Dec;87(12):1689-1698. doi: 10.1134/S0006297922120239.

Abstract

ae-mail: sas@belozersky.msu.ru Protein phylogeny is usually reconstructed basing on a multiple alignment of amino acid sequences. One of the problems of such alignments is the presence of regions with different degree of conservation, including those with a questionable quality of the alignment. This problem is often solved by filtering the alignment columns with a special software developed for this purpose. In this work, we investigated various approaches to the phylogeny reconstruction using proteins with two evolutionary domains as examples. The sequences of such proteins are inherently heterogeneous in the degree of conservation due to the presence of both evolutionary domains and linkers between them, as well as the N- and C-termini. It is shown that filtering the alignment columns on average improves the quality of reconstruction only when using the full-length sequences and only for eukaryotic proteins. Limiting the alignment to the evolutionary domains with rejection of less conserved linkers and terminal sequences on average worsened the quality of phylogenetic reconstruction.

Keywords: evolutionary domains; filtration of multiple sequence alignment; phylogenetic inference.

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Phylogeny
  • Proteins* / chemistry
  • Proteins* / genetics
  • Sequence Alignment
  • Software*

Substances

  • Proteins