Nonbifurcating Phylogenetic Tree Inference via the Adaptive LASSO

J Am Stat Assoc. 2021;116(534):858-873. doi: 10.1080/01621459.2020.1778481. Epub 2020 Jul 20.

Abstract

Phylogenetic tree inference using deep DNA sequencing is reshaping our understanding of rapidly evolving systems, such as the within-host battle between viruses and the immune system. Densely sampled phylogenetic trees can contain special features, including sampled ancestors in which we sequence a genotype along with its direct descendants, and polytomies in which multiple descendants arise simultaneously. These features are apparent after identifying zero-length branches in the tree. However, current maximum-likelihood based approaches are not capable of revealing such zero-length branches. In this paper, we find these zero-length branches by introducing adaptive-LASSO-type regularization estimators for the branch lengths of phylogenetic trees, deriving their properties, and showing regularization to be a practically useful approach for phylogenetics.

Keywords: FISTA; adaptive LASSO; consistency; model selection; phylogenetics; sparsity; ℓ1 regularization.