Unknown identification using reference mass spectra. Quality evaluation of databases

J Am Soc Mass Spectrom. 1999 Dec;10(12):1229-40. doi: 10.1016/S1044-0305(99)00104-X.

Abstract

The high success of the "uncertified" mass spectrometry spectral collection started in 1956 demonstrated qualitatively that a partial reference mass spectrum, even one measured routinely, can be of real value. Correct matchings were still possible despite reference errors, which almost never led to close matches that were incorrect. This study shows quantitatively that the number of different compounds, not the number of peaks in a spectrum, is by far the most important determinant of database efficiency for identifying a "global" unknown. A statistical evaluation of matching performance shows that only 6, 12, and 18 peaks in a reference spectrum are 13%, 67%, and 96%, respectively, as valuable as hundreds of peaks. Also, a separately measured second spectrum of the same compound is 50% as valuable as the first. Database expansion that tripled the number of possible wrong answers only reduced the proportion of correct identifications by 5%. Corrections of a mass or abundance error in each of six reference spectra increase the database matching performance by as much as the addition of one spectrum of a new compound. A new "matching quality index" based statistically on these values indicates that the largest database is also by far the most effective for matching unknowns.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.
  • Review

MeSH terms

  • Animals
  • Databases, Factual / standards*
  • Humans
  • Mass Spectrometry / standards*