GP4: an integrated Gram-Positive Protein Prediction Pipeline for subcellular localization mimicking bacterial sorting

Brief Bioinform. 2021 Jul 20;22(4):bbaa302. doi: 10.1093/bib/bbaa302.

Abstract

Subcellular localization is a critical aspect of protein function and the potential application of proteins either as drugs or drug targets, or in industrial and domestic applications. However, the experimental determination of protein localization is time consuming and expensive. Therefore, various localization predictors have been developed for particular groups of species. Intriguingly, despite their major representation amongst biotechnological cell factories and pathogens, a meta-predictor based on sorting signals and specific for Gram-positive bacteria was still lacking. Here we present GP4, a protein subcellular localization meta-predictor mainly for Firmicutes, but also Actinobacteria, based on the combination of multiple tools, each specific for different sorting signals and compartments. Novelty elements include improved cell-wall protein prediction, including differentiation of the type of interaction, prediction of non-canonical secretion pathway target proteins, separate prediction of lipoproteins and better user experience in terms of parsability and interpretability of the results. GP4 aims at mimicking protein sorting as it would happen in a bacterial cell. As GP4 is not homology based, it has a broad applicability and does not depend on annotated databases with homologous proteins. Non-canonical usage may include little studied or novel species, synthetic and engineered organisms, and even re-use of the prediction data to develop custom prediction algorithms. Our benchmark analysis highlights the improved performance of GP4 compared to other widely used subcellular protein localization predictors. A webserver running GP4 is available at http://gp4.hpc.rug.nl/.

Keywords: GP4; Gram-positive; homology-based prediction; prediction methods; protein subcellular localization prediction; sorting signals.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Actinobacteria* / genetics
  • Actinobacteria* / metabolism
  • Algorithms*
  • Bacterial Proteins* / genetics
  • Bacterial Proteins* / metabolism
  • Computational Biology*
  • Databases, Protein*
  • Firmicutes* / genetics
  • Firmicutes* / metabolism
  • Sequence Analysis, Protein

Substances

  • Bacterial Proteins