Prediction of protein submitochondria locations based on data fusion of various features of sequences

J Theor Biol. 2011 Jan 21;269(1):208-16. doi: 10.1016/j.jtbi.2010.10.026. Epub 2010 Oct 30.

Abstract

In this study, the predictors are developed for protein submitochondria locations based on various features of sequences. Information about the submitochondria location for a mitochondria protein can provide much better understanding about its function. We use ten representative models of protein samples such as pseudo amino acid composition, dipeptide composition, functional domain composition, the combining discrete model based on prediction of solvent accessibility and secondary structure elements, the discrete model of pairwise sequence similarity, etc. We construct a predictor based on support vector machines (SVMs) for each representative model. The overall prediction accuracy by the leave-one-out cross validation test obtained by the predictor which is based on the discrete model of pairwise sequence similarity is 1% better than the best computational system that exists for this problem. Moreover, we develop a method based on ordered weighted averaging (OWA) which is one of the fusion data operators. Therefore, OWA is applied on the 11 best SVM-based classifiers that are constructed based on various features of sequence. This method is called Mito-Loc. The overall leave-one-out cross validation accuracy obtained by Mito-Loc is about 95%. This indicates that our proposed approach (Mito-Loc) is superior to the result of the best existing approach which has already been reported.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Computational Biology / methods*
  • Databases, Protein*
  • Mitochondria / metabolism*
  • Mitochondrial Proteins / chemistry*
  • Mitochondrial Proteins / metabolism*
  • Protein Transport
  • Sequence Analysis, Protein*

Substances

  • Mitochondrial Proteins