A new universal system of tree shape indices

bioRxiv [Preprint]. 2023 Nov 27:2023.07.17.549219. doi: 10.1101/2023.07.17.549219.

Abstract

The comparison and categorization of tree diagrams is fundamental to large parts of biology, linguistics, computer science, and other fields, yet the indices currently applied to describing tree shape have important flaws that complicate their interpretation and limit their scope. Here we introduce a new system of indices with no such shortcomings. Our indices account for node sizes and branch lengths and are robust to small changes in either attribute. Unlike currently popular phylogenetic diversity, phylogenetic entropy, and tree balance indices, our definitions assign interpretable values to all rooted trees and enable meaningful comparison of any pair of trees. Our self-consistent definitions further unite measures of diversity, richness, balance, symmetry, effective height, effective outdegree, and effective branch count in a coherent system, and we derive numerous simple relationships between these indices. The main practical advantages of our indices are in 1) quantifying diversity in non-ultrametric trees; 2) assessing the balance of trees that have non-uniform branch lengths or node sizes; 3) comparing the balance of trees with different leaf counts or outdegrees; 4) obtaining a coherent, generic, multidimensional quantification of tree shape that is robust to sampling error and inferential error. We illustrate these features by comparing the shapes of trees representing the evolution of HIV and of Uralic languages, and trees generated by computational models of tumour evolution. Given the ubiquity of tree structures, we identify a wide range of applications across diverse domains.

Keywords: phylogenetic diversity; phylogenetic entropy; rooted trees; tree balance; tree indices; tree shape.

Publication types

  • Preprint