Deciphering protein evolution and fitness landscapes with latent space models

Xinqiang Ding; Zhengting Zou; Charles L Brooks Iii

doi:10.1038/s41467-019-13633-0

Deciphering protein evolution and fitness landscapes with latent space models

Nat Commun. 2019 Dec 10;10(1):5644. doi: 10.1038/s41467-019-13633-0.

Authors

Xinqiang Ding¹, Zhengting Zou², Charles L Brooks Iii^{3

4

5}

Affiliations

¹ Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA.
² Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, 48109, USA.
³ Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA. brookscl@umich.edu.
⁴ Department of Chemistry, University of Michigan, Ann Arbor, MI, 48109, USA. brookscl@umich.edu.
⁵ Biophysics Program, University of Michigan, Ann Arbor, MI, 48109, USA. brookscl@umich.edu.

Abstract

Protein sequences contain rich information about protein evolution, fitness landscapes, and stability. Here we investigate how latent space models trained using variational auto-encoders can infer these properties from sequences. Using both simulated and real sequences, we show that the low dimensional latent space representation of sequences, calculated using the encoder model, captures both evolutionary and ancestral relationships between sequences. Together with experimental fitness data and Gaussian process regression, the latent space representation also enables learning the protein fitness landscape in a continuous low dimensional space. Moreover, the model is also useful in predicting protein mutational stability landscapes and quantifying the importance of stability in shaping protein evolution. Overall, we illustrate that the latent space models learned using variational auto-encoders provide a mechanism for exploration of the rich data contained in protein sequences regarding evolution, fitness and stability and hence are well-suited to help guide protein engineering efforts.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Evolution, Molecular*
Genetic Fitness*
Models, Genetic*
Mutation / genetics
Phylogeny
Protein Stability
Proteins / genetics*
Sequence Alignment

Substances

Proteins

Abstract

Publication types

MeSH terms

Substances

Grants and funding