S-SulfPred: A sensitive predictor to capture S-sulfenylation sites based on a resampling one-sided selection undersampling-synthetic minority oversampling technique

J Theor Biol. 2017 Jun 7:422:84-89. doi: 10.1016/j.jtbi.2017.03.031. Epub 2017 Apr 12.

Abstract

Protein S-sulfenylation is a reversible post-translational modification involving covalent attachment of hydroxide to the thiol group of cysteine residues, which is involved in various biological processes including cell signaling, response to stress and protein functions. Herein we present S-SulfPred, a support vector machine based model to capture potential S-sulfenylation sites and improve the efficiency and relevance of experimental identification of protein S-sulfenylation sites. One-sided selection (OSS) undersampling and synthetic minority oversampling technique (SMOTE) oversampling were combined to establish balanced training datasets. This approach is shown to perform better than using only OSS or SMOTE in an independent test. The best combination of position-specific amino acid propensity and five physicochemical properties of amino acids were selected to optimize the predictor performance. Using S-SulfPred, we achieve an average sensitivity of 74.62%, and an average specificity of 71.62% on independent datasets. Compared with other published tools, S-SulfPred attains both higher sensitivity and specificity. We not only propose a highly accurate method to predict protein S-sulfenylation sites, but also provide insights that could improve the efficiency of other bioinformatics tools.

Keywords: OSS; PSAAP; Prediction; S-sulfenylation; SMOTE; SVM.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cysteine / genetics
  • Cysteine / metabolism
  • Databases, Protein*
  • Protein Processing, Post-Translational / physiology*
  • Sequence Analysis, Protein / methods*
  • Software*
  • Support Vector Machine*

Substances

  • Cysteine