Decomposing Structural Response Due to Sequence Changes in Protein Domains with Machine Learning

J Mol Biol. 2020 Jul 24;432(16):4435-4446. doi: 10.1016/j.jmb.2020.05.021. Epub 2020 May 30.

Abstract

How protein domain structure changes in response to mutations is not well understood. Some mutations change the structure drastically, while most only result in small changes. To gain an understanding of this, we decompose the relationship between changes in domain sequence and structure using machine learning. We select pairs of evolutionarily related domains with a broad range of evolutionary distances. In contrast to earlier studies, we do not find a strictly linear relationship between sequence and structural changes. We train a random forest regressor that predicts the structural similarity between pairs with an average accuracy of 0.029 lDDT ( local Distance Difference Test) score, and a correlation coefficient of 0.92. Decomposing the feature importance shows that the domain length, or analogously, size is the most important feature. Our model enables assessing deviations in relative structural response, and thus prediction of evolutionary trajectories, in protein domains across evolution.

Keywords: evolutionary distance; mutations; protein evolution; protein structure.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods*
  • Evolution, Molecular
  • Machine Learning
  • Models, Molecular
  • Mutation*
  • Protein Conformation
  • Protein Domains
  • Proteins / chemistry*
  • Proteins / genetics

Substances

  • Proteins