Predicting cysteine reactivity changes upon phosphorylation using XGBoost

FEBS Open Bio. 2024 Jan;14(1):51-62. doi: 10.1002/2211-5463.13737. Epub 2023 Nov 20.

Abstract

Cysteine reactivity serves as a significant indicator of protein function and can be affected by phosphorylation events. Experimental approaches have been developed to investigate this effect, but the scale is still relatively limited. Machine-learning approaches promise to accelerate the investigation of these phenomena. In this study, protein sequence information, distances to the closest phosphorylation sites, and the membership score of the intrinsically disordered region were used to represent the cysteine. Following the feature selection using an elastic net model, two groups of binary classifiers based on XGBoost were built to predict the occurrence and the direction of the reactivity change as a response to phosphorylation events, respectively. In addition, function enrichment analysis was performed on proteins/genes predicted to have reactivity changes. XGBoost performed the best in the independent test with AUC of 0.8192 and 0.9203 for the prediction of the change's occurrence and direction, respectively. The use of two binary classifiers successively resulted in an accuracy of 0.7568 in predicting whether reactivity would be unchanged, increased, or decreased. The enrichment analysis revealed the association of proteins carrying reactivity-changed cysteine residues with various disease-related pathways, particularly cancer, autosomal dominant diseases, and viral infections. Changes in cysteine reactivity influenced by phosphorylation are site-specific and can be predicted by XGBoost algorithms. Our model provides an efficient alternative way to explore the cysteine reactivity upon phosphorylation at the proteome-wide level, facilitating the investigation of protein functions and their clinical insights. Our code is available on GitHub (https://github.com/DarinaOsamu/predictors-of-cysteine-reactivity-changes).

Keywords: XGBoost; cysteine reactivity; machine learning; phosphorylation proteins; protein functions.

Publication types

  • Research Support, Non-U.S. Gov't
  • Comment

MeSH terms

  • Algorithms*
  • Cysteine* / chemistry
  • Machine Learning
  • Phosphorylation
  • Proteome

Substances

  • Cysteine
  • Proteome