An application of topological data analysis in predicting sumoylation sites

PeerJ. 2023 Oct 12:11:e16204. doi: 10.7717/peerj.16204. eCollection 2023.

Abstract

Sumoylation is a reversible post-translational modification that regulates certain significant biochemical functions in proteins. The protein alterations caused by sumoylation are associated with the incidence of some human diseases. Therefore, identifying the sites of sumoylation in proteins may provide a direction for mechanistic research and drug development. Here, we propose a new computational approach for identifying sumoylation sites using an encoding method based on topological data analysis. The features of our model captured the key physical and biological properties of proteins at multiple scales. In a 10-fold cross validation, the outcomes of our model showed 96.45% of sensitivity (Sn), 94.65% of accuracy (Acc), 0.8946 of Matthew's correlation coefficient (MCC), and 0.99 of area under curve (AUC). The proposed predictor with only topological features achieves the best MCC and AUC in comparison to the other released methods. Our results suggest that topological information is an additional parameter that can assist in the prediction of sumoylation sites and provide a novel perspective for further research in protein sumoylation.

Keywords: Feature extraction; Persistent homology; Sumoylation; Topological data analysis.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology* / methods
  • Humans
  • Protein Processing, Post-Translational
  • Proteins / chemistry
  • Sumoylation*

Substances

  • Proteins

Grants and funding

This work was supported by a grant (No.12071051) of the NSFC and the State Key Laboratory of Structural Analysis, Optimization and CAE Software for Industrial Equipment. There was no additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.