Prediction of Protein Acetylation Sites using Kernel Naive Bayes Classifier Based on Protein Sequences Profiling

Bioinformation. 2018 May 31;14(5):213-218. doi: 10.6026/97320630014213. eCollection 2018.

Abstract

Lysine acetylation is one of the decisive categories of protein post-translational modification (PTM), it is convoluted in many significant cellular developments and severe diseases in the biological system. The experimental identification of protein-acetylated sites is painstaking, time-consuming and expensive. Hence, there is significant interest in the development of computational approaches for consistent prediction of acetylation sites using protein sequences. Features selection from protein sequences plays a significant role for acetylation sites prediction. We describe an improved feature selection approach for acetylation sites prediction based on kernel naive Bayes classifier (KNBC). We have shown that KNBC generated from selected features by a new feature selection method outperforms than the existing methods for identification of acetylation sites. The sensitivity, specificity, ACC (Accuracy), MCC (Matthews Correlation Coefficient) and AUC (Area under Curve of ROC) in our proposed method are as follows 80.71%, 93.39%, 76.73%, 41.37% and 83.0% with the optimum window size is 47. Thus the kernel naive Bayes classifier finds application in acetylation site prediction.

Keywords: Acetylation; Binary Encoding; CKSAAP Encoding; Kernel Naive Bayes Classifier; Kruskal-Wallis test; Protein Sequences.