Phosphorylation site prediction with a modified k-nearest neighbor algorithm and BLOSUM62 matrix

Conf Proc IEEE Eng Med Biol Soc. 2005:2005:6075-8. doi: 10.1109/IEMBS.2005.1615878.

Abstract

Phosphorylation is one of the most important post-translational modifications for eukaryotic proteins. Experimental identification of protein kinases' (PKs) substrates with their phosphorylation sites is time-consuming and often restricted by the availability of enzymatic reactions. Phosphorylation sites prediction with their specific kinase from machine learning approaches based on their primary sequences is favorably needed, for these methods can provide fast and automatic annotations, which can be used as guidelines for further experimental consideration. In this paper, we presented a modified k-Nearest Neighbor (k-NN) method measured by the Manhattan distance for phosphorylation site prediction. BLOSUM62-based similarity scores between two phosphorylation sites were adopted as the input vectors. Leave-one-out testing on two PK groups, PKA and CK2, shows that it outperforms two existing methods, Scansite and NetPhosK, which suggests that this method is another competitive computational approach in this branch of bioinformatics.