Discovery of False Identification Using Similarity Difference in GC-MS based Metabolomics

J Chemom. 2015 Feb 1;29(2):80-86. doi: 10.1002/cem.2665.

Abstract

Compound identification is a critical process in metabolomics. The widely used approach for compound identification in gas chromatography-mass spectrometry (GC-MS) based metabolomics is the spectrum matching, in which the mass spectral similarity between an experimental mass spectrum and each mass spectrum in a reference library is calculated. While various similarity measures have been developed to improve the overall accuracy of compound identification, little attention has been paid to reducing the false discovery rate. We, therefore, develop an approach for controlling false identification rate using the distribution of the difference between the first and the second highest spectral similarity scores. We further propose a model-based approach to achieving a desired true positive rate. The developed method is applied to the NIST mass spectral library and its performance is compared with the conventional approach that uses only the maximum spectral similarity score. The results show that the developed method achieves a significantly higher F1 score and positive predictive value than those of the conventional approach.

Keywords: Compound Identification; Gas Chromatography-Mass Spectrometry (GC-MS); Metabolomics; Similarity; True Positive Rate.