An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets

Daniel Schwartz; Steven P Gygi

doi:10.1038/nbt1146

An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets

Nat Biotechnol. 2005 Nov;23(11):1391-8. doi: 10.1038/nbt1146.

Authors

Daniel Schwartz¹, Steven P Gygi

Affiliation

¹ Department of Cell Biology, 240 Longwood Ave., Harvard Medical School, Boston, Massachusetts 02115, USA. dschwartz@hms.harvard.edu

PMID: 16273072
DOI: 10.1038/nbt1146

Abstract

With the recent exponential increase in protein phosphorylation sites identified by mass spectrometry, a unique opportunity has arisen to understand the motifs surrounding such sites. Here we present an algorithm designed to extract motifs from large data sets of naturally occurring phosphorylation sites. The methodology relies on the intrinsic alignment of phospho-residues and the extraction of motifs through iterative comparison to a dynamic statistical background. Results show the identification of dozens of novel and known phosphorylation motifs from recently published serine, threonine and tyrosine phosphorylation studies. When applied to a linguistic data set to test the versatility of the approach, the algorithm successfully extracted hundreds of language motifs. This method, in addition to shedding light on the consensus sequences of identified and as yet unidentified kinases and modular protein domains, may also eventually be used as a tool to determine potential phosphorylation sites in proteins of interest.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Algorithms
Amino Acid Motifs
Biotechnology / methods*
Computational Biology / methods*
Data Interpretation, Statistical*
Internet
Mass Spectrometry
Models, Statistical
Peptide Mapping
Phosphorylation
Programming Languages
Proteins / chemistry*
Sequence Analysis, Protein / methods
Software

Substances

Proteins

Grants and funding

HG03456/HG/NHGRI NIH HHS/United States