Prediction of protein signal sequences and their cleavage sites by statistical rulers

Biochem Biophys Res Commun. 2005 Dec 16;338(2):1005-11. doi: 10.1016/j.bbrc.2005.10.046. Epub 2005 Oct 21.

Abstract

Functioning as an "address tag" or "zip code" that guides nascent proteins (newly synthesized proteins in the cytosol) to wherever they are needed, signal peptides (also called targeting signals or signal sequences) have become a crucial tool in finding new drugs or reprogramming cells for gene therapy. To effectively and timely use such a tool, however, the first important thing is to develop an automated method for quickly and accurately identifying the signal peptide for a given nascent protein. With the avalanche of new protein sequences generated in the post-genomic era, the challenge has become even more urgent and critical. In this paper, five statistical rulers were derived via performing a mutual information analysis. By combining these statistical rulers, a new prediction algorithm was established and high success prediction rates were observed. The new algorithm may play a complementary role to the existing algorithms in this area. It is anticipated that the mutual information approach introduced here may be very useful for studying many other sequence-coupling problems in molecular biology as well.

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Binding Sites
  • Data Interpretation, Statistical
  • Models, Chemical*
  • Models, Statistical
  • Molecular Sequence Data
  • Peptide Hydrolases / analysis
  • Peptide Hydrolases / chemistry*
  • Peptides / analysis
  • Peptides / chemistry*
  • Protein Binding
  • Protein Sorting Signals*
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein / methods*

Substances

  • Peptides
  • Protein Sorting Signals
  • Peptide Hydrolases