Molecular Latent Space Simulators for Distributed and Multimolecular Trajectories

J Phys Chem A. 2023 Jun 29;127(25):5470-5490. doi: 10.1021/acs.jpca.3c01362. Epub 2023 Jun 14.

Abstract

All atom molecular dynamics (MD) simulations offer a powerful tool for molecular modeling, but the short time steps required for numerical stability of the integrator place many interesting molecular events out of reach of unbiased simulations. The popular and powerful Markov state modeling (MSM) approach can extend these time scales by stitching together multiple short discontinuous trajectories into a single long-time kinetic model but necessitates a configurational coarse-graining of the phase space that entails a loss of spatial and temporal resolution and an exponential increase in complexity for multimolecular systems. Latent space simulators (LSS) present an alternative formalism that employs a dynamical, as opposed to configurational, coarse graining comprising three back-to-back learning problems to (i) identify the molecular system's slowest dynamical processes, (ii) propagate the microscopic system dynamics within this slow subspace, and (iii) generatively reconstruct the trajectory of the system within the molecular phase space. A trained LSS model can generate temporally and spatially continuous synthetic molecular trajectories at orders of magnitude lower cost than MD to improve sampling of rare transition events and metastable states to reduce statistical uncertainties in thermodynamic and kinetic observables. In this work, we extend the LSS formalism to short discontinuous training trajectories generated by distributed computing and to multimolecular systems without incurring exponential scaling in computational cost. First, we develop a distributed LSS model over thousands of short simulations of a 264-residue proteolysis-targeting chimera (PROTAC) complex to generate ultralong continuous trajectories that identify metastable states and collective variables to inform PROTAC therapeutic design and optimization. Second, we develop a multimolecular LSS architecture to generate physically realistic ultralong trajectories of DNA oligomers that can undergo both duplex hybridization and hairpin folding. These trajectories retain thermodynamic and kinetic characteristics of the training data while providing increased precision of folding populations and time scales across simulation temperature and ion concentration.