Quantifying the effects of lossy compression on energies calculated from molecular dynamics trajectories

Protein Sci. 2022 Dec;31(12):e4511. doi: 10.1002/pro.4511.

Abstract

Molecular dynamics (MD) simulations are now able to routinely reach timescales of microseconds and beyond. This has led to a corresponding increase in the amount of MD trajectory data that needs to be stored, particularly when those trajectories contain explicit solvent molecules. As such, it is desirable to be able to compress trajectory data while still retaining as much of the original information as possible. In this work, we describe compressing MD trajectory data using the NetCDF4/HDF5 file format, making use of quantization of the original positions to achieve better compression ratios. We also analyze the affect this has on both the resulting positions and the energies calculated from post-processing these trajectories, and recommend an optimal level of quantization. Overall we find the NetCDF4/HDF5 format to be an excellent choice for storing MD trajectory data in terms of speed, compressibility, and versatility.

Keywords: data analysis; energy calculation; molecular dynamics; trajectory compression.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Data Compression* / methods
  • Molecular Dynamics Simulation*
  • Solvents

Substances

  • Solvents