pLoc-mEuk: Predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general PseAAC

Genomics. 2018 Jan;110(1):50-58. doi: 10.1016/j.ygeno.2017.08.005. Epub 2017 Aug 14.

Abstract

Many efforts have been made in predicting the subcellular localization of eukaryotic proteins, but most of the existing methods have the following two limitations: (1) their coverage scope is less than ten locations and hence many organelles in an eukaryotic cell cannot be covered, and (2) they can only be used to deal with single-label systems in which each of the constituent proteins has one and only one location. Actually, proteins with multiple locations are particularly interesting since they may have some exceptional functions very important for in-depth understanding the biological process in a cell and for selecting drug target as well. Although several predictors (such as "Euk-mPLoc", "Euk-PLoc 2.0" and "iLoc-Euk") can cover up to 22 different location sites, and they also have the function to treat multi-labeled proteins, further efforts are needed to improve their prediction quality, particularly in enhancing the absolute true rate and in reducing the absolute false rate. Here we propose a new predictor called "pLoc-mEuk" by extracting the key GO (Gene Ontology) information into the general PseAAC (Pseudo Amino Acid Composition). Rigorous cross-validations on a high-quality and stringent benchmark dataset have indicated that the proposed pLoc-mEuk predictor is remarkably superior to iLoc-Euk, the best of the aforementioned three predictors. To maximize the convenience of most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc-mEuk/, by which users can easily get their desired results without the need to go through the complicated mathematics involved.

Keywords: Chou's metrics; GO; ML-GKR; Multi-label system; PseAAC.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cytoplasm / metabolism
  • Protein Transport
  • Sequence Analysis, Protein / methods*
  • Software*