SuccSPred2.0: A Two-Step Model to Predict Succinylation Sites Based on Multifeature Fusion and Selection Algorithm

J Comput Biol. 2022 Oct;29(10):1085-1094. doi: 10.1089/cmb.2022.0109. Epub 2022 Jun 17.

Abstract

Protein succinylation is a novel type of post-translational modification in recent decade years. It played an important role in biological structure and functions verified by experiments. However, it is time consuming and laborious for the wet experimental identification of succinylation sites. Traditional technology cannot adapt to the rapid growth of the biological sequence data sets. In this study, a new computational method named SuccSPred2.0 was proposed to identify succinylation sites in the protein sequences based on multifeature fusion and maximal information coefficient (MIC) method. SuccSPred2.0 was implemented based on a two-step strategy. At first, high-dimension features were reduced by linear discriminant analysis to prevent overfitting. Subsequently, MIC method was employed to select the important features binding classifiers to predict succinylation sites. From the compared experiments on 10-fold cross-validation and independent test data sets, SuccSPred2.0 obtained promising improvements. Comparative experiments showed that SuccSPred2.0 was superior to previous tools in identifying succinylation sites in the given proteins.

Keywords: feature representation; lysine succinylation; maximal information coefficient; post-translational modification; pseudo amino acid composition; system biology.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Lysine* / metabolism
  • Protein Processing, Post-Translational
  • Proteins / chemistry

Substances

  • Proteins
  • Lysine