MESSAR: Automated recommendation of metabolite substructures from tandem mass spectra

PLoS One. 2020 Jan 16;15(1):e0226770. doi: 10.1371/journal.pone.0226770. eCollection 2020.

Abstract

Despite the increasing importance of non-targeted metabolomics to answer various life science questions, extracting biochemically relevant information from metabolomics spectral data is still an incompletely solved problem. Most computational tools to identify tandem mass spectra focus on a limited set of molecules of interest. However, such tools are typically constrained by the availability of reference spectra or molecular databases, limiting their applicability of generating structural hypotheses for unknown metabolites. In contrast, recent advances in the field illustrate the possibility to expose the underlying biochemistry without relying on metabolite identification, in particular via substructure prediction. We describe an automated method for substructure recommendation motivated by association rule mining. Our framework captures potential relationships between spectral features and substructures learned from public spectral libraries. These associations are used to recommend substructures for any unknown mass spectrum. Our method does not require any predefined metabolite candidates, and therefore it can be used for the hypothesis generation or partial identification of unknown unknowns. The method is called MESSAR (MEtabolite SubStructure Auto-Recommender) and is implemented in a free online web service available at messar.biodatamining.be.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Automation
  • Biological Products / analysis*
  • Databases, Factual*
  • Humans
  • Metabolome*
  • Pharmaceutical Preparations / analysis*
  • Tandem Mass Spectrometry / methods*

Substances

  • Biological Products
  • Pharmaceutical Preparations

Grants and funding

W.B. is a postdoctoral researcher of the Research Foundation – Flanders (FWO). WB was supported by a post-doctoral fellowship of the Belgian American Educational Foundation (BAEF). TD, ER, KL, and DV were supported by the VlAIO (Vlaams Agentschap Innovatie & Ondernemen) grant HBC.2016.0890. The computational resources and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by the Research Foundation – Flanders (FWO) and the Flemish Government – department EWI. TDV and EPR are employed by and receive salary from Janssen Pharmaceutica NV. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.