Using Nearest Feature Line and Tunable Nearest Neighbor methods for prediction of protein subcellular locations

Qing-Bin Gao; Zheng-Zhi Wang

doi:10.1016/j.compbiolchem.2005.08.002

Using Nearest Feature Line and Tunable Nearest Neighbor methods for prediction of protein subcellular locations

Comput Biol Chem. 2005 Oct;29(5):388-92. doi: 10.1016/j.compbiolchem.2005.08.002. Epub 2005 Oct 5.

Authors

Qing-Bin Gao¹, Zheng-Zhi Wang

Affiliation

¹ Institute of Automation, National University of Defense Technology, Changsha, 410073 Hunan, PR China. gqb_kd@yahoo.com.cn

PMID: 16213794
DOI: 10.1016/j.compbiolchem.2005.08.002

Abstract

The subcellular location of a protein is closely correlated with it biological function. In this paper, two new pattern classification methods termed as Nearest Feature Line (NFL) and Tunable Nearest Neighbor (TNN) have been introduced to predict the subcellular location of proteins based on their amino acid composition alone. The simulation experiments were performed with the jackknife test on a previously constructed data set, which consists of 2,427 eukaryotic and 997 prokaryotic proteins. All protein sequences in the data set fall into four eukaryotic subcellular locations and three prokaryotic subcellular locations. The NFL classifier reached the total prediction accuracies of 82.5% for the eukaryotic proteins and 91.0% for the prokaryotic proteins. The TNN classifier reached the total prediction accuracies of 83.6 and 92.2%, respectively. It is clear that high prediction accuracies have been achieved. Compared with Support Vector Machine (SVM) and Nearest Neighbor methods, these two methods display similar or even higher prediction accuracies. Hence, we conclude that NFL and TNN can be used as complementary methods for prediction of protein subcellular locations.

Publication types

Evaluation Study
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Computational Biology / methods*
Databases, Protein
Proteins / analysis
Proteins / chemistry*
Software

Substances

Proteins