C-iSUMO: A sumoylation site predictor that incorporates intrinsic characteristics of amino acid sequences

Comput Biol Chem. 2020 Feb 19:87:107235. doi: 10.1016/j.compbiolchem.2020.107235. Online ahead of print.

Abstract

Post-translational modifications are considered important molecular interactions in protein science. One of these modifications is "sumoylation" whose computational detection has recently become a challenge. In this paper, we propose a new computational predictor which makes use of the sine and cosine of backbone torsion angles and the accessible surface area for predicting sumoylation sites. The aforementioned features were computed for all the proteins in our benchmark dataset, and a training matrix consisting of sumoylation and non-sumoylation sites was ultimately created. This training matrix was balanced by undersampling the majority class (non-sumoylation sites) using the NearMiss method. Finally, an AdaBoost classifier was used for discriminating between sumoylation and non-sumoylation sites. Our predictor was called "C-iSumo" because of its effective use of circular functions. C-iSumo was compared with another predictor which was outperformed in statistical metrics such as sensitivity (0.734), accuracy (0.746) and Matthews correlation coefficient (0.494).

Keywords: Adaboost; Amino acids; Computational prediction; Proteins; Sumoylation.