Machine Learning Driven Analysis of Large Scale Simulations Reveals Conformational Characteristics of Ubiquitin Chains

J Chem Theory Comput. 2020 May 12;16(5):3205-3220. doi: 10.1021/acs.jctc.0c00045. Epub 2020 Apr 7.

Abstract

Understanding the conformational characteristics of protein complexes in solution is crucial for a deeper insight in their biological function. Molecular dynamics simulations performed on high performance computing plants and with modern simulation techniques can be used to obtain large data sets that contain conformational and thermodynamic information about biomolecular systems. While this can in principle give a detailed picture of protein-protein interactions in solution and therefore complement experimental data, it also raises the challenge of processing exceedingly large high-dimensional data sets with several million samples. Here we present a novel method for the characterization of protein-protein interactions, which combines a neural network based dimensionality reduction technique to obtain a two-dimensional representation of the conformational space with a density based clustering algorithm for state detection and a metric which assesses the (dis)similarity between different conformational spaces. This method is highly scalable and therefore makes the analysis of massive data sets computationally tractable. We demonstrate the power of this approach to large scale data analysis by characterizing highly dynamic conformational phase spaces of differently linked ubiquitin (Ub) oligomers from coarse-grained simulations. We are able to extract a protein-protein interaction model for two unlinked Ub proteins which is then used to determine how the Ub-Ub interaction pattern is altered in Ub oligomers by the introduction of a covalent linkage. We find that the Ub chain conformational ensemble depends highly on the linkage type and for some cases also on the Ub chain length. By this, we obtain insight into the conformational characteristics of different Ub chains and how this may contribute to linkage type and chain length specific recognition.

MeSH terms

  • Machine Learning*
  • Molecular Dynamics Simulation*
  • Protein Conformation
  • Ubiquitin / chemistry*

Substances

  • Ubiquitin