A Maximum-Likelihood Approach to Force-Field Calibration

Bartłomiej Zaborowski; Dawid Jagieła; Cezary Czaplewski; Anna Hałabis; Agnieszka Lewandowska; Wioletta Żmudzińska; Stanisław Ołdziej; Agnieszka Karczyńska; Christian Omieczynski; Tomasz Wirecki; Adam Liwo

doi:10.1021/acs.jcim.5b00395

A Maximum-Likelihood Approach to Force-Field Calibration

J Chem Inf Model. 2015 Sep 28;55(9):2050-70. doi: 10.1021/acs.jcim.5b00395. Epub 2015 Aug 20.

Affiliations

¹ Faculty of Chemistry, University of Gdańsk , ul. Wita Stwosza 63, 80-308 Gdańsk, Poland.
² Laboratory of Biopolymer Structure, Intercollegiate Faculty of Biotechnology, University of Gdańsk and Medical University of Gdańsk , Kładki 24, 80-922 Gdańsk, Poland.
³ Center for In Silico Protein Structure and School of Computational Sciences, Korea Institute for Advanced Study , 87 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea.

PMID: 26263302
DOI: 10.1021/acs.jcim.5b00395

Abstract

A new approach to the calibration of the force fields is proposed, in which the force-field parameters are obtained by maximum-likelihood fitting of the calculated conformational ensembles to the experimental ensembles of training system(s). The maximum-likelihood function is composed of logarithms of the Boltzmann probabilities of the experimental conformations, calculated with the current energy function. Because the theoretical distribution is given in the form of the simulated conformations only, the contributions from all of the simulated conformations, with Gaussian weights in the distances from a given experimental conformation, are added to give the contribution to the target function from this conformation. In contrast to earlier methods for force-field calibration, the approach does not suffer from the arbitrariness of dividing the decoy set into native-like and non-native structures; however, if such a division is made instead of using Gaussian weights, application of the maximum-likelihood method results in the well-known energy-gap maximization. The computational procedure consists of cycles of decoy generation and maximum-likelihood-function optimization, which are iterated until convergence is reached. The method was tested with Gaussian distributions and then applied to the physics-based coarse-grained UNRES force field for proteins. The NMR structures of the tryptophan cage, a small α-helical protein, determined at three temperatures (T = 280, 305, and 313 K) by Hałabis et al. ( J. Phys. Chem. B 2012 , 116 , 6898 - 6907 ), were used. Multiplexed replica-exchange molecular dynamics was used to generate the decoys. The iterative procedure exhibited steady convergence. Three variants of optimization were tried: optimization of the energy-term weights alone and use of the experimental ensemble of the folded protein only at T = 280 K (run 1); optimization of the energy-term weights and use of experimental ensembles at all three temperatures (run 2); and optimization of the energy-term weights and the coefficients of the torsional and multibody energy terms and use of experimental ensembles at all three temperatures (run 3). The force fields were subsequently tested with a set of 14 α-helical and two α + β proteins. Optimization run 1 resulted in better agreement with the experimental ensemble at T = 280 K compared with optimization run 2 and in comparable performance on the test set but poorer agreement of the calculated folding temperature with the experimental folding temperature. Optimization run 3 resulted in the best fit of the calculated ensembles to the experimental ones for the tryptophan cage but in much poorer performance on the training set, suggesting that use of a small α-helical protein for extensive force-field calibration resulted in overfitting of the data for this protein at the expense of transferability. The optimized force field resulting from run 2 was found to fold 13 of the 14 tested α-helical proteins and one small α + β protein with the correct topologies; the average structures of 10 of them were predicted with accuracies of about 5 Å C(α) root-mean-square deviation or better. Test simulations with an additional set of 12 α-helical proteins demonstrated that this force field performed better on α-helical proteins than the previous parametrizations of UNRES. The proposed approach is applicable to any problem of maximum-likelihood parameter estimation when the contributions to the maximum-likelihood function cannot be evaluated at the experimental points and the dimension of the configurational space is too high to construct histograms of the experimental distributions.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Calibration
Likelihood Functions
Models, Biological
Molecular Dynamics Simulation*
Peptides / chemistry*

Substances

Peptides