Prediction of Signal Peptide Cleavage Sites with Subsite-Coupled and Template Matching Fusion Algorithm

Mol Inform. 2014 Mar;33(3):230-9. doi: 10.1002/minf.201300077. Epub 2014 Mar 11.

Abstract

Fast and effective prediction of signal peptides (SP) and their cleavage sites is of great importance in computational biology. The approaches developed to predict signal peptide can be roughly divided into machine learning based, and sliding windows based. In order to further increase the prediction accuracy and coverage of organism for SP cleavage sites, we propose a novel method for predicting SP cleavage sites called Signal-CTF that utilizes machine learning and sliding windows, and is designed for N-termial secretory proteins in a large variety of organisms including human, animal, plant, virus, bacteria, fungi and archaea. Signal-CTF consists of three distinct elements: (1) a subsite-coupled and regularization function with a scaled window of fixed width that selects a set of candidates of possible secretion-cleavable segment for a query secretory protein; (2) a sum fusion system that integrates the outcomes from aligning the cleavage site template sequence with each of the aforementioned candidates in a scaled window of fixed width to determine the best candidate cleavage sites for the query secretory protein; (3) a voting system that identifies the ultimate signal peptide cleavage site among all possible results derived from using scaled windows of different width. When compared with Signal-3L and SignalP 4.0 predictors, the prediction accuracy of Signal-CTF is 4-12 %, 10-25 % higher than that of Signal-3L for human, animal and eukaryote, and SignalP 4.0 for eukaryota, Gram-positive bacteria and Gram-negative bacteria, respectively. Comparing with PRED-SIGNAL and SignalP 4.0 predictors on the 32 archaea secretory proteins of used in Bagos's paper, the prediction accuracy of Signal-CTF is 12.5 %, 25 % higher than that of PRED-SIGNAL and SignalP 4.0, respectively. The predicting results of several long signal peptides show that the Signal-CTF can better predict cleavage sites for long signal peptides than SignalP, Phobius, Philius, SPOCTOPUS, Signal-CF and Signal-3L. These results show that Signal-CTF is more accurate and flexible in predicting signal peptides of different characteristics for many organisms. Signal-CTF is freely available as a web-server at http://darwin2.cbi.utsa.edu/minniweb/index.html.

Keywords: Fusion; Signal peptide cleavage site; Subsite-coupled; Template matching; Variable width scaled window.