Machine Learning Deciphered Molecular Mechanistics with Accurate Kinetic and Thermodynamic Prediction

J Chem Theory Comput. 2024 Feb 23. doi: 10.1021/acs.jctc.3c01412. Online ahead of print.

Abstract

Time-lagged independent component analysis (tICA) and the Markov state model (MSM) have been extensively employed for extracting conformational dynamics and kinetic community networks from unbiased trajectory ensembles. However, these techniques may not be the optimal choice for elucidating transition mechanisms within low-dimensional representations, especially for intricate biosystems. Unraveling the association mechanism in such complex systems always necessitates permutations of several essential independent components or collective variables, a process that is inherently obscure and may require empirical knowledge for selection. To address these challenges, we have implemented an integrated unsupervised dimension reduction model: uniform manifold approximation and projection (UMAP) with hierarchy density-based spatial clustering of applications with noise (HDBSCAN). This approach effectively generates low-dimensional configurational embeddings. The hierarchical application of this architecture, in conjunction with MSM, reveals global kinetic connectivity while identifying local conformational states. Consequently, our methodology establishes a multiscale mechanistic elucidation framework. Leveraging the benefits of the uniform sample distribution and a denoising approach, our model demonstrates robustness in preserving global and local data structures compared to traditional dimension reduction methods in the field of MD analysis area. The interpretability of hyperparameter selection and compatibility with downstream tasks are cross-validated across various simulation data sets, utilizing both computational evaluation metrics and experimental kinetic observables. Furthermore, the predicted Mcl1-BH3 association kinetics (0.76 s-1) is in close agreement with surface plasmon resonance experiments (0.12 s-1), affirming the plausibility of the identified pathway composed of representative conformations. We anticipate that the devised workflow will serve as a foundational framework for studying recognition patterns in complex biological systems. Its contributions extend to the exploration of protein functional dynamics and rational drug design, offering a potent avenue for advancing research in these domains.