A vectorial tree distance measure

Sci Rep. 2022 Mar 28;12(1):5256. doi: 10.1038/s41598-022-08360-4.

Abstract

A vectorial distance measure for trees is presented. Given two trees, we define a Tree-Alignment (T-Alignment). We T-align the trees from their centers outwards, starting from the root-branches, to make the next level as similar as possible. The algorithm is recursive; condition on the T-alignment of the root-branches we T-align the sub-branches, thereafter each T-alignment is conditioned on the previous one. We define a minimal T-alignment under a lexicographic order which follows the intuition that the differences between the two trees constitutes a vector. Given such a minimal T-alignment, the difference in the number of branches calculated at any level defines the entry of the distance vector at that level. We compare our algorithm to other well-known tree distance measures in the task of clustering sets of phylogenetic trees. We use the TreeSimGM simulator for generating stochastic phylogenetic trees. The vectorial tree distance (VTD) can successfully separate symmetric from asymmetric trees, and hierarchical from non-hierarchical trees. We also test the algorithm as a classifier of phylogenetic trees extracted from two members of the fungi kingdom, mushrooms and mildews, thus showimg that the algorithm can separate real world phylogenetic trees. The Matlab code can be accessed via: https://gitlab.com/avner.priel/vectorial-tree-distance .

MeSH terms

  • Algorithms*
  • Animals
  • Cluster Analysis
  • Disease Vectors*
  • Phylogeny