Predicting the impacts of mutations on protein-ligand binding affinity based on molecular dynamics simulations and machine learning methods

Comput Struct Biotechnol J. 2020 Feb 20:18:439-454. doi: 10.1016/j.csbj.2020.02.007. eCollection 2020.

Abstract

Purpose: Mutation-induced variation of protein-ligand binding affinity is the key to many genetic diseases and the emergence of drug resistance, and therefore predicting such mutation impacts is of great importance. In this work, we aim to predict the mutation impacts on protein-ligand binding affinity using efficient structure-based, computational methods.

Methods: Relying on consolidated databases of experimentally determined data we characterize the affinity change upon mutation based on a number of local geometrical features and monitor such feature differences upon mutation during molecular dynamics (MD) simulations. The differences are quantified according to average difference, trajectory-wise distance or time-vary differences. Machine-learning methods are employed to predict the mutation impacts using the resulting conventional or time-series features. Predictions based on estimation of energy and based on investigation of molecular descriptors were conducted as benchmarks.

Results: Our method (machine-learning techniques using time-series features) outperformed the benchmark methods, especially in terms of the balanced F1 score. Particularly, deep-learning models led to the best prediction performance with distinct improvements in balanced F1 score and a sustained accuracy.

Conclusion: Our work highlights the effectiveness of the characterization of affinity change upon mutations. Furthermore, deep-learning techniques are well designed for handling the extracted time-series features. This study can lead to a deeper understanding of mutation-induced diseases and resistance, and further guide the development of innovative drug design.

Keywords: CNN, convolutional neural network; Deep learning; HMM, hidden Markov model; LSTM, long short-term memory; Local geometrical features; MD, molecular dynamics; MM/GBSA, molecular mechanics/generalized born surface area; MM/PBSA, molecular mechanics/Poisson-Boltzmann surface area; Missense mutation; Molecular dynamics (MD) simulations; Mutation impact; Protein-ligand binding affinity; RF, random forest; RMSD, root-mean-square deviation; RNN, recurrent neural network; SASA, solvent accessible surface area; Time series features; WTP, wildtype protein; aacomp, amino acid composition descriptors; const, constitutional descriptors; ctd, composition transition and distribution descriptors; kappa, Kappa shape indices; paacomp, type 1 pseudo amino acid composition descriptors; top, topological descriptors.