Genetic programming for creating Chou's pseudo amino acid based features for submitochondria localization

Amino Acids. 2008 May;34(4):653-60. doi: 10.1007/s00726-007-0018-1. Epub 2008 Jan 4.

Abstract

Given a protein that is localized in the mitochondria it is very important to know the submitochondria localization of that protein to understand its function. In this work, we propose a submitochondria localizer whose feature extraction method is based on the Chou's pseudo-amino acid composition. The pseudo-amino acid based features are obtained by combining pseudo-amino acid compositions with hundreds of amino-acid indices and amino-acid substitution matrices, then from this huge set of features a small set of 15 "artificial" features is created. The feature creation is performed by genetic programming combining one or more "original" features by means of some mathematical operators. Finally, the set of combined features are used to train a radial basis function support vector machine. This method is named GP-Loc. Moreover, we also propose a very few parameterized method, named ALL-Loc, where all the "original" features are used to train a linear support vector machine. The overall prediction accuracy obtained by GP-Loc is 89% when the jackknife cross-validation is used, this result outperforms the performance obtained in the literature (85.2%) using the same dataset. While the overall prediction accuracy obtained by ALL-Loc is 83.9%.

MeSH terms

  • Amino Acid Sequence
  • Amino Acids / chemistry*
  • Computational Biology*
  • Computer Simulation*
  • Databases, Protein
  • Mitochondria / chemistry*
  • Models, Genetic*
  • Proteins / chemistry*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Analysis, Protein

Substances

  • Amino Acids
  • Proteins