A machine-learning approach reveals that alignment properties alone can accurately predict inference of lateral gene transfer from discordant phylogenies

Mol Biol Evol. 2009 Sep;26(9):1931-9. doi: 10.1093/molbev/msp105. Epub 2009 May 14.

Abstract

Among the methods currently used in phylogenomic practice to detect the presence of lateral gene transfer (LGT), one of the most frequently employed is the comparison of gene tree topologies for different genes. In cases where the phylogenies for different genes are incompatible, or discordant, for well-supported branches there are three simple interpretations for the result: 1) gene duplications (paralogy) followed by many independent gene losses have occurred, 2) LGT has occurred, or 3) the phylogeny is well supported but for reasons unknown is nonetheless incorrect. Here, we focus on the third possibility by examining the properties of 22,437 published multiple sequence alignments, the Bayesian maximum likelihood trees for which either do or do not suggest the occurrence of LGT by the criterion of discordant branches. The alignments that produce discordant phylogenies differ significantly in several salient alignment properties from those that do not. Using a support vector machine, we were able to predict the inference of discordant tree topologies with up to 80% accuracy from alignment properties alone.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artificial Intelligence*
  • Gene Transfer, Horizontal*
  • Genome / genetics
  • Phylogeny*
  • Principal Component Analysis
  • Proteins / chemistry
  • Sequence Alignment / methods*

Substances

  • Proteins