Algorithm and data structures for efficient energy maintenance during Monte Carlo simulation of proteins

J Comput Biol. 2004;11(5):902-32. doi: 10.1089/cmb.2004.11.902.

Abstract

Monte Carlo simulation (MCS) is a common methodology to compute pathways and thermodynamic properties of proteins. A simulation run is a series of random steps in conformation space, each perturbing some degrees of freedom of the molecule. A step is accepted with a probability that depends on the change in value of an energy function. Typical energy functions sum many terms. The most costly ones to compute are contributed by atom pairs closer than some cutoff distance. This paper introduces a new method that speeds up MCS by exploiting the facts that proteins are long kinematic chains and that few degrees of freedom are changed at each step. A novel data structure, called the ChainTree, captures both the kinematics and the shape of a protein at successive levels of detail. It is used to efficiently detect self-collision (steric clash between atoms) and/or find all atom pairs contributing to the energy. It also makes it possible to identify partial energy sums left unchanged by a perturbation, thus allowing the energy value to be incrementally updated. Computational tests on four proteins of sizes ranging from 68 to 755 amino acids show that MCS with the ChainTree method is significantly faster (as much as 10 times faster for the largest protein) than with the widely used grid method. They also indicate that speed-up increases with larger proteins.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Computational Biology*
  • Computer Simulation
  • Data Interpretation, Statistical
  • Kinetics
  • Models, Molecular*
  • Monte Carlo Method*
  • Proteins / chemistry*

Substances

  • Proteins