Analysis and prediction of human acetylation using a cascade classifier based on support vector machine

BMC Bioinformatics. 2019 Jun 17;20(1):346. doi: 10.1186/s12859-019-2938-7.

Abstract

Background: Acetylation on lysine is a widespread post-translational modification which is reversible and plays a crucial role in some biological activities. To better understand the mechanism, it is necessary to identify acetylation sites in proteins accurately. Computational methods are popular because they are more convenient and faster than experimental methods. In this study, we proposed a new computational method to predict acetylation sites in human by combining sequence features and structural features including physicochemical property (PCP), position specific score matrix (PSSM), auto covariation (AC), residue composition (RC), secondary structure (SS) and accessible surface area (ASA), which can well characterize the information of acetylated lysine sites. Besides, a two-step feature selection was applied, which combined mRMR and IFS. It finally trained a cascade classifier based on SVM, which successfully solved the imbalance between positive samples and negative samples and covered all negative sample information.

Results: The performance of this method is measured with a specificity of 72.19% and a sensibility of 76.71% on independent dataset which shows that a cascade SVM classifier outperforms single SVM classifier.

Conclusions: In addition to the analysis of experimental results, we also made a systematic and comprehensive analysis of the acetylation data.

Keywords: Acetylation sites; Cascade classifier; Human; Lysine; Sequence features; Structural feature; Support vector machine; Systematic and comprehensive analysis.

MeSH terms

  • Acetylation
  • Amino Acid Sequence
  • Animals
  • Computational Biology / methods*
  • Databases, Protein
  • Gene Ontology
  • Humans
  • Lysine / chemistry
  • Mice
  • Molecular Sequence Annotation
  • Position-Specific Scoring Matrices
  • Protein Processing, Post-Translational
  • Protein Structure, Secondary
  • Proteins / chemistry
  • Proteins / metabolism
  • Rats
  • Support Vector Machine*

Substances

  • Proteins
  • Lysine