A Generative Angular Model of Protein Structure Evolution

Mol Biol Evol. 2017 Aug 1;34(8):2085-2100. doi: 10.1093/molbev/msx137.

Abstract

Recently described stochastic models of protein evolution have demonstrated that the inclusion of structural information in addition to amino acid sequences leads to a more reliable estimation of evolutionary parameters. We present a generative, evolutionary model of protein structure and sequence that is valid on a local length scale. The model concerns the local dependencies between sequence and structure evolution in a pair of homologous proteins. The evolutionary trajectory between the two structures in the protein pair is treated as a random walk in dihedral angle space, which is modeled using a novel angular diffusion process on the two-dimensional torus. Coupling sequence and structure evolution in our model allows for modeling both "smooth" conformational changes and "catastrophic" conformational jumps, conditioned on the amino acid changes. The model has interpretable parameters and is comparatively more realistic than previous stochastic models, providing new insights into the relationship between sequence and structure evolution. For example, using the trained model we were able to identify an apparent sequence-structure evolutionary motif present in a large number of homologous protein pairs. The generative nature of our model enables us to evaluate its validity and its ability to simulate aspects of protein evolution conditioned on an amino acid sequence, a related amino acid sequence, a related structure or any combination thereof.

Keywords: directional statistics; evolution; probabilistic model; protein structure.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Computer Simulation
  • Evolution, Molecular
  • Models, Genetic
  • Models, Molecular
  • Protein Conformation
  • Protein Structural Elements / genetics
  • Proteins / genetics*
  • Proteins / metabolism
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein / methods*
  • Sequence Analysis, Protein / statistics & numerical data

Substances

  • Proteins