Computational identification of protein S-sulfenylation sites by incorporating the multiple sequence features information

Md Mehedi Hasan; Dianjing Guo; Hiroyuki Kurata

doi:10.1039/c7mb00491e

Computational identification of protein S-sulfenylation sites by incorporating the multiple sequence features information

Mol Biosyst. 2017 Nov 21;13(12):2545-2550. doi: 10.1039/c7mb00491e.

Authors

Md Mehedi Hasan¹, Dianjing Guo, Hiroyuki Kurata

Affiliation

¹ Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan. mehedicau@hotmail.com.

PMID: 28990628
DOI: 10.1039/c7mb00491e

Abstract

Cysteine S-sulfenylation is a major type of posttranslational modification that contributes to protein structure and function regulation in many cellular processes. Experimental identification of S-sulfenylation sites is challenging, due to the low abundance of proteins and the inefficient experimental methods. Computational identification of S-sulfenylation sites is an alternative strategy to annotate the S-sulfenylated proteome. In this study, a novel computational predictor SulCysSite was developed for accurate prediction of S-sulfenylation sites based on multiple sequence features, including amino acid index properties, binary amino acid codes, position specific scoring matrix, and compositions of profile-based amino acids. To learn the prediction model of SulCysSite, a random forest classifier was applied. The final SulCysSite achieved an AUC value of 0.819 in a 10-fold cross-validation test. It also exhibited higher performance than other existing computational predictors. In addition, the hidden and complex mechanisms were extracted from the predictive model of SulCysSite to investigate the understandable rules (i.e. feature combination) of S-sulfenylation sites. The SulCysSite is a useful computational resource for prediction of S-sulfenylation sites. The online interface and datasets are publicly available at .

MeSH terms

Algorithms
Amino Acid Sequence
Computational Biology / methods*
Humans
Position-Specific Scoring Matrices
Protein Processing, Post-Translational
Software
Support Vector Machine