predML-Site: Predicting Multiple Lysine PTM Sites With Optimal Feature Representation and Data Imbalance Minimization

IEEE/ACM Trans Comput Biol Bioinform. 2022 Nov-Dec;19(6):3624-3634. doi: 10.1109/TCBB.2021.3114349. Epub 2022 Dec 8.

Abstract

Identifying of post-translational modifications (PTM) is crucial in the study of computational proteomics, cell biology, pathogenesis, and drug development due to its role in many bio-molecular mechanisms. Computational methods for predicting multiple PTM at the same lysine residues, often referred to as K-PTM, is still evolving. This paper presents a novel computational tool, abbreviated as predML-Site, for predicting KPTM, such as acetylation, crotonylation, methylation, succinylation from an uncategorized peptide sample involving single, multiple, or no modification. For informative feature representation, multiple sequence encoding schemes, such as the sequence-coupling, binary encoding, k-spaced amino acid pairs, amino acid factor have been used with ANOVA and incremental feature selection. As a core predictor, a cost-sensitive SVM classifier has been adopted which effectively mitigates the effect of class-label imbalance in the dataset. predML-Site predicts multi-label PTM sites with 84.18% accuracy using the top 91 features. It has also achieved 85.34% aiming and 86.58% coverage rate which are much better than the existing state-of-the-art predictors on the same rigorous validation test. This performance indicates that predML-Site can be used as a supportive tool for further K-PTM study. For the convenience of the experimental scientists, predML-Site has been deployed as a user-friendly web-server at http://103.99.176.239/predML-Site.

MeSH terms

  • Algorithms*
  • Amino Acids / chemistry
  • Computational Biology / methods
  • Lysine* / chemistry
  • Peptides

Substances

  • Lysine
  • Amino Acids
  • Peptides