Support vector machines for prediction of protein subcellular location by incorporating quasi-sequence-order effect

J Cell Biochem. 2002;84(2):343-8. doi: 10.1002/jcb.10030.

Abstract

Support Vector Machine (SVM), which is one class of learning machines, was applied to predict the subcellular location of proteins by incorporating the quasi-sequence-order effect (Chou [2000] Biochem. Biophys. Res. Commun. 278:477-483). In this study, the proteins are classified into the following 12 groups: (1) chloroplast, (2) cytoplasm, (3) cytoskeleton, (4) endoplasmic reticulum, (5) extracellular, (6) Golgi apparatus, (7) lysosome, (8) mitochondria, (9) nucleus, (10) peroxisome, (11) plasma membrane, and (12) vacuole, which account for most organelles and subcellular compartments in an animal or plant cell. Examinations for self-consistency and jackknife testing of the SVMs method were conducted for three sets consisting of 1,911, 2,044, and 2,191 proteins. The correct rates for self-consistency and the jackknife test values achieved with these protein sets were 94 and 83% for 1,911 proteins, 92 and 78% for 2,044 proteins, and 89 and 75% for 2,191 proteins, respectively. Furthermore, tests for correct prediction rates were undertaken with three independent testing datasets containing 2,148 proteins, 2,417 proteins, and 2,494 proteins producing values of 84, 77, and 74%, respectively.

MeSH terms

  • Proteins / metabolism*
  • Subcellular Fractions / metabolism*

Substances

  • Proteins