A learned embedding for efficient joint analysis of millions of mass spectra

Nat Methods. 2022 Jun;19(6):675-678. doi: 10.1038/s41592-022-01496-1. Epub 2022 May 30.

Abstract

Computational methods that aim to exploit publicly available mass spectrometry repositories rely primarily on unsupervised clustering of spectra. Here we trained a deep neural network in a supervised fashion on the basis of previous assignments of peptides to spectra. The network, called 'GLEAMS', learns to embed spectra in a low-dimensional space in which spectra generated by the same peptide are close to one another. We applied GLEAMS for large-scale spectrum clustering, detecting groups of unidentified, proximal spectra representing the same peptide. We used these clusters to explore the dark proteome of repeatedly observed yet consistently unidentified mass spectra.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Cluster Analysis
  • Neural Networks, Computer
  • Peptides* / chemistry
  • Proteome / analysis
  • Tandem Mass Spectrometry* / methods

Substances

  • Peptides
  • Proteome