Evolutionary velocity with protein language models predicts evolutionary dynamics of diverse proteins

Cell Syst. 2022 Apr 20;13(4):274-285.e6. doi: 10.1016/j.cels.2022.01.003. Epub 2022 Feb 3.

Abstract

The degree to which evolution is predictable is a fundamental question in biology. Previous attempts to predict the evolution of protein sequences have been limited to specific proteins and to small changes, such as single-residue mutations. Here, we demonstrate that by using a protein language model to predict the local evolution within protein families, we recover a dynamic "vector field" of protein evolution that we call evolutionary velocity (evo-velocity). Evo-velocity generalizes to evolution over vastly different timescales, from viral proteins evolving over years to eukaryotic proteins evolving over geologic eons, and can predict the evolutionary dynamics of proteins that were not used to develop the original model. Evo-velocity also yields new evolutionary insights by predicting strategies of viral-host immune escape, resolving conflicting theories on the evolution of serpins, and revealing a key role of horizontal gene transfer in the evolution of eukaryotic glycolysis.

Keywords: evolutionary dynamics; evolutionary predictability; fitness landscape; language models; machine learning; phylogenetic analysis.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Evolution, Molecular*
  • Language*
  • Mutation / genetics
  • Proteins / genetics

Substances

  • Proteins