SPECTRUM - A MATLAB Toolbox for Proteoform Identification from Top-Down Proteomics Data

Sci Rep. 2019 Aug 2;9(1):11267. doi: 10.1038/s41598-019-47724-1.

Abstract

Top-Down Proteomics (TDP) is an emerging proteomics protocol that involves identification, characterization, and quantitation of intact proteins using high-resolution mass spectrometry. TDP has an edge over other proteomics protocols in that it allows for: (i) accurate measurement of intact protein mass, (ii) high sequence coverage, and (iii) enhanced identification of post-translational modifications (PTMs). However, the complexity of TDP spectra poses a significant impediment to protein search and PTM characterization. Furthermore, limited software support is currently available in the form of search algorithms and pipelines. To address this need, we propose 'SPECTRUM', an open-architecture and open-source toolbox for TDP data analysis. Its salient features include: (i) MS2-based intact protein mass tuning, (ii) de novo peptide sequence tag analysis, (iii) propensity-driven PTM characterization, (iv) blind PTM search, (v) spectral comparison, (vi) identification of truncated proteins, (vii) multifactorial coefficient-weighted scoring, and (viii) intuitive graphical user interfaces to access the aforementioned functionalities and visualization of results. We have validated SPECTRUM using published datasets and benchmarked it against salient TDP tools. SPECTRUM provides significantly enhanced protein identification rates (91% to 177%) over its contemporaries. SPECTRUM has been implemented in MATLAB, and is freely available along with its source code and documentation at https://github.com/BIRL/SPECTRUM/.

Publication types

  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Algorithms*
  • Databases, Protein
  • Datasets as Topic
  • HeLa Cells
  • Humans
  • Molecular Weight
  • Protein Isoforms / chemistry
  • Protein Isoforms / isolation & purification
  • Protein Isoforms / metabolism
  • Protein Processing, Post-Translational
  • Proteome / chemistry
  • Proteome / isolation & purification
  • Proteome / metabolism
  • Proteomics / methods*
  • Sequence Analysis, Protein / methods
  • Software*

Substances

  • Protein Isoforms
  • Proteome