[Prediction of protein subcellular localization based on multilayer sparse coding]

Sheng Wu Gong Cheng Xue Bao. 2019 Apr 25;35(4):687-696. doi: 10.13345/j.cjb.180403.
[Article in Chinese]

Abstract

In order to provide a theoretical basis for better understanding the function and properties of proteins, we proposed a simple and effective feature extraction method for protein sequences to determine the subcellular localization of proteins. First, we introduced sparse coding combined with the information of amino acid composition to extract the feature values of protein sequences. Then the multilayer pooling integration was performed according to different sizes of dictionaries. Finally, the extracted feature values were sent into the support vector machine to test the effectiveness of our model. The success rates in data set ZD98, CH317 and Gram1253 were 95.9%, 93.4% and 94.7%, respectively as verified by the Jackknife test. Experiments showed that our method based on multilayer sparse coding can remarkably improve the accuracy of the prediction of protein subcellular localization.

文中提出了一种简单有效的蛋白质亚细胞区间定位预测方法,为进一步了解蛋白质的功能和性质提供理论基础。运用稀疏编码,结合氨基酸组成信息提取蛋白质序列特征,基于不同字典大小对得到的特征进行多层次池化整合,并送入支持向量机进行分类。经Jackknife 检验,在数据集ZD98、CH317 和Gram1253 上的预测成功率分别达到95.9%、93.4%和94.7%。实验证明基于多层次稀疏编码的分类预测算法能显著提高蛋白质亚细胞区间定位的预测精度。.

Keywords: amino acid composition; multilayer pooling; sparse coding; subcellular localization prediction; support vector machine.

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Computational Biology
  • Protein Transport
  • Proteins
  • Subcellular Fractions
  • Support Vector Machine*

Substances

  • Proteins