[Prediction of protein subcellular locations by ensemble of improved K-nearest neighbor]

Sheng Wu Gong Cheng Xue Bao. 2017 Apr 25;33(4):683-691. doi: 10.13345/j.cjb.160389.
[Article in Chinese]

Abstract

Adaboost algorithm with improved K-nearest neighbor classifiers is proposed to predict protein subcellular locations. Improved K-nearest neighbor classifier uses three sequence feature vectors including amino acid composition, dipeptide and pseudo amino acid composition of protein sequence. K-nearest neighbor uses Blast in classification stage. The overall success rates by the jackknife test on two data sets of CH317 and Gram1253 are 92.4% and 93.1%. Adaboost algorithm with the novel K-nearest neighbor improved by Blast is an effective method for predicting subcellular locations of proteins.

基于Adaboost 算法对多个相似性比对K 最近邻 (K-nearest neighbor,KNN) 分类器集成实现蛋白质的亚细胞定位预测。相似性比对KNN 算法分别以氨基酸组成、二肽、伪氨基酸组成为蛋白序列特征,在KNN的决策阶段使用Blast 比对决定蛋白质的亚细胞定位。在Jackknife 检验下,Adaboost 集成分类算法提取3 种蛋白序列特征,3 种特征在数据集CH317 和Gram1253 的最高预测成功率分别为92.4%和93.1%。结果表明Adaboost 集成改进KNN 分类预测方法是一种有效的蛋白质亚细胞定位预测方法。.

Keywords: Adaboost; K-nearest neighbor; basic local alignment search tool (Blast); protein sequence characteristics; subcellular locations.

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Cluster Analysis*
  • Databases, Protein
  • Proteins / chemistry*

Substances

  • Proteins