Accurately Predicting Mutation-Caused Stability Changes from Protein Sequences Using Extreme Gradient Boosting

J Chem Inf Model. 2020 Apr 27;60(4):2388-2395. doi: 10.1021/acs.jcim.0c00064. Epub 2020 Mar 30.

Abstract

Accurately predicting the impact of point mutation on protein stability has crucial roles in protein design and engineering. In this study, we proposed a novel method (BoostDDG) to predict stability changes upon point mutations from protein sequences based on the extreme gradient boosting. We extracted features comprehensively from evolutional information and predicted structures and performed feature selection by a strategy of sequential forward selection. The features and parameters were optimized by homologue-based cross-validation to avoid overfitting. Finally, we found that 14 features from six groups led to the highest Pearson correlation coefficient (PCC) of 0.535, which is consistent with the 0.540 on an independent test. Our method was indicated to consistently outperform other sequence-based methods on three precompiled test sets, and 7363 variants on two proteins (PTEN and TPMT). These results highlighted that BoostDDG is a powerful tool for predicting stability changes upon point mutations from protein sequences.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Mutation*
  • Point Mutation*
  • Protein Stability
  • Proteins* / genetics

Substances

  • Proteins