XGBPRH: Prediction of Binding Hot Spots at Protein⁻RNA Interfaces Utilizing Extreme Gradient Boosting

Genes (Basel). 2019 Mar 21;10(3):242. doi: 10.3390/genes10030242.

Abstract

Hot spot residues at protein⁻RNA complexes are vitally important for investigating the underlying molecular recognition mechanism. Accurately identifying protein⁻RNA binding hot spots is critical for drug designing and protein engineering. Although some progress has been made by utilizing various available features and a series of machine learning approaches, these methods are still in the infant stage. In this paper, we present a new computational method named XGBPRH, which is based on an eXtreme Gradient Boosting (XGBoost) algorithm and can effectively predict hot spot residues in protein⁻RNA interfaces utilizing an optimal set of properties. Firstly, we download 47 protein⁻RNA complexes and calculate a total of 156 sequence, structure, exposure, and network features. Next, we adopt a two-step feature selection algorithm to extract a combination of 6 optimal features from the combination of these 156 features. Compared with the state-of-the-art approaches, XGBPRH achieves better performances with an area under the ROC curve (AUC) score of 0.817 and an F1-score of 0.802 on the independent test set. Meanwhile, we also apply XGBPRH to two case studies. The results demonstrate that the method can effectively identify novel energy hotspots.

Keywords: XGBoost; hot spots; protein–RNA interfaces; two-step feature selection.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Binding Sites
  • Computational Biology / methods*
  • Models, Molecular
  • Protein Binding
  • Proteins / chemistry
  • Proteins / genetics
  • Proteins / metabolism*
  • RNA / chemistry
  • RNA / genetics
  • RNA / metabolism*
  • ROC Curve

Substances

  • Proteins
  • RNA