How Fitch-Margoliash Algorithm can Benefit from Multi Dimensional Scaling

Evol Bioinform Online. 2011:7:61-85. doi: 10.4137/EBO.S7048. Epub 2011 Jun 7.

Abstract

Whatever the phylogenetic method, genetic sequences are often described as strings of characters, thus molecular sequences can be viewed as elements of a multi-dimensional space. As a consequence, studying motion in this space (ie, the evolutionary process) must deal with the amazing features of high-dimensional spaces like concentration of measured phenomenon.TO STUDY HOW THESE FEATURES MIGHT INFLUENCE PHYLOGENY RECONSTRUCTIONS, WE EXAMINED A PARTICULAR POPULAR METHOD: the Fitch-Margoliash algorithm, which belongs to the Least Squares methods. We show that the Least Squares methods are closely related to Multi Dimensional Scaling. Indeed, criteria for Fitch-Margoliash and Sammon's mapping are somewhat similar. However, the prolific research in Multi Dimensional Scaling has definitely allowed outclassing Sammon's mapping.Least Square methods for tree reconstruction can now take advantage of these improvements. However, "false neighborhood" and "tears" are the two main risks in dimensionality reduction field: "false neighborhood" corresponds to a widely separated data in the original space that are found close in representation space, and neighbor data that are displayed in remote positions constitute a "tear". To address this problem, we took advantage of the concepts of "continuity" and "trustworthiness" in the tree reconstruction field, which limit the risk of "false neighborhood" and "tears". We also point out the concentration of measured phenomenon as a source of error and introduce here new criteria to build phylogenies with improved preservation of distances and robustness.The authors and the Evolutionary Bioinformatics Journal dedicate this article to the memory of Professor W.M. Fitch (1929-2011).

Keywords: Fitch-Margoliash; Least Square methods; Multi Dimensional Scaling; Sammon’s mapping; molecular phylogeny.