Mini-review: Recent advances in post-translational modification site prediction based on deep learning

Comput Struct Biotechnol J. 2022 Jun 30:20:3522-3532. doi: 10.1016/j.csbj.2022.06.045. eCollection 2022.

Abstract

Post-translational modifications (PTMs) are closely linked to numerous diseases, playing a significant role in regulating protein structures, activities, and functions. Therefore, the identification of PTMs is crucial for understanding the mechanisms of cell biology and diseases therapy. Compared to traditional machine learning methods, the deep learning approaches for PTM prediction provide accurate and rapid screening, guiding the downstream wet experiments to leverage the screened information for focused studies. In this paper, we reviewed the recent works in deep learning to identify phosphorylation, acetylation, ubiquitination, and other PTM types. In addition, we summarized PTM databases and discussed future directions with critical insights.

Keywords: AAindex, Amino acid index; ATP, Adenosine triphosphate; AUC, Area under curve; Ac, Acetylation; BE, Binary encoding; BLOSUM, Blocks substitution matrix; Bi-LSTM, Bidirectional LSTM; CKSAAP, Composition of k-spaced amino acid Pairs; CNN, Convolutional neural network; CNNOH, CNN with the one-hot encoding; CNNWE, CNN with the word-embedding encoding; CNNrgb, CNN red green blue; CV, Cross-validation; DC-CNN, Densely connected convolutional neural network; DL, Deep learning; DNNs, Deep neural networks; Deep learning; E. coli, Escherichia coli; EBGW, Encoding based on grouped weight; EGAAC, Enhanced grouped amino acids content; IG, Information gain; K, Lysine; KNN, k nearest neighbor; LASSO, Least absolute shrinkage and selection operator; LSTM, Long short-term memory; LSTMWE, LSTM with the word-embedding encoding; M.musculus, Mus musculus; MDC, Modular densely connected convolutional networks; MDCAN, Multilane dense convolutional attention network; ML, Machine learning; MLP, Multilayer perceptron; MMI, Multivariate mutual information; Machine learning; Mass spectrometry; NMBroto, Normalized Moreau-Broto autocorrelation; P, Proline; PSP, PhosphoSitePlus; PSSM, Position-specific scoring matrix; PTM, Post-translational modifications; Ph, Phosphorylation; Post-translational modification; Prediction; PseAAC, Pseudo-amino acid composition; R, Arginine; RF, Random forest; RNN, Recurrent neural network; ROC, Receiver operating characteristic; S, Serine; S. typhimurium, Salmonella typhimurium; S.cerevisiae, Saccharomyces cerevisiae; SE, Squeeze and excitation; SEV, Split to Equal Validation; ST, Source and target; SUMO, Small ubiquitin-like modifier; SVM, Support vector machines; T, Threonine; Ub, Ubiquitination; Y, Tyrosine; ZSL, Zero-shot learning.

Publication types

  • Review