How searching against multiple libraries can lead to biased results in GC/MS-based metabolomics

Rapid Commun Mass Spectrom. 2023 Feb 15;37(3):e9437. doi: 10.1002/rcm.9437.

Abstract

Rationale: Databases of electron ionization mass spectra are often used in GC/MS-based untargeted metabolomics analysis. The results of the library search depend on several factors, such as the size and quality of the database, and the library search algorithm. We found out that the list of considered m/z values is another important parameter. Unfortunately, this information is not usually specified by software developers and it is hidden from the end user.

Methods: We created synthetic data sets and figured out how several popular software products (AMDIS, ChromaTOF, MS Search, and Xcalibur) select the list of m/z values for the library search. Moreover, we considered data sets of real mass spectra (presented in both the NIST and FiehnLib libraries) and compared the library search results obtained within different software products. All programs under consideration use the NIST MS Search binaries to perform the library search using the Identity algorithm.

Results: We found that AMDIS and ChromaTOF can give biased library search results under particular conditions. In untargeted metabolomics, this can happen when NIST and FiehnLib libraries are used simultaneously, the scan range of the instrument is less than 85, and the correct answer is present only in the FiehnLib library.

Conclusions: The main reason for biased results is that the information about the scan range is not stored in the metadata of library records. As a result, in the case of AMDIS and ChromaTOF software, some unrecorded peaks are considered as missing during the library search, the respective compound is penalized, and the correct answer falls outside the top five or even top 10 hits. At the same time, the default algorithm for selecting the list of considered m/z values implemented in MS Search is free from such unexpected behavior.

MeSH terms

  • Algorithms*
  • Gas Chromatography-Mass Spectrometry / methods
  • Mass Spectrometry / methods
  • Metabolomics / methods
  • Software*