MAGIC: an automated N-linked glycoprotein identification tool using a Y1-ion pattern matching algorithm and in silico MS² approach

Anal Chem. 2015 Feb 17;87(4):2466-73. doi: 10.1021/ac5044829. Epub 2015 Jan 28.

Abstract

Glycosylation is a highly complex modification influencing the functions and activities of proteins. Interpretation of intact glycopeptide spectra is crucial but challenging. In this paper, we present a mass spectrometry-based automated glycopeptide identification platform (MAGIC) to identify peptide sequences and glycan compositions directly from intact N-linked glycopeptide collision-induced-dissociation spectra. The identification of the Y1 (peptideY0 + GlcNAc) ion is critical for the correct analysis of unknown glycoproteins, especially without prior knowledge of the proteins and glycans present in the sample. To ensure accurate Y1-ion assignment, we propose a novel algorithm called Trident that detects a triplet pattern corresponding to [Y0, Y1, Y2] or [Y0-NH3, Y0, Y1] from the fragmentation of the common trimannosyl core of N-linked glycopeptides. To facilitate the subsequent peptide sequence identification by common database search engines, MAGIC generates in silico spectra by overwriting the original precursor with the naked peptide m/z and removing all of the glycan-related ions. Finally, MAGIC computes the glycan compositions and ranks them. For the model glycoprotein horseradish peroxidase (HRP) and a 5-glycoprotein mixture, a 2- to 31-fold increase in the relative intensities of the peptide fragments was achieved, which led to the identification of 7 tryptic glycopeptides from HRP and 16 glycopeptides from the mixture via Mascot. In the HeLa cell proteome data set, MAGIC processed over a thousand MS(2) spectra in 3 min on a PC and reported 36 glycopeptides from 26 glycoproteins. Finally, a remarkable false discovery rate of 0 was achieved on the N-glycosylation-free Escherichia coli data set. MAGIC is available at http://ms.iis.sinica.edu.tw/COmics/Software_MAGIC.html .

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Automation
  • Computational Biology*
  • Databases, Factual
  • Escherichia coli / chemistry
  • Glycopeptides / analysis*
  • Glycopeptides / chemistry
  • HeLa Cells
  • Humans
  • Software*

Substances

  • Glycopeptides