Automating methods for estimating metabolite volatility

Front Microbiol. 2023 Dec 14:14:1267234. doi: 10.3389/fmicb.2023.1267234. eCollection 2023.

Abstract

The volatility of metabolites can influence their biological roles and inform optimal methods for their detection. Yet, volatility information is not readily available for the large number of described metabolites, limiting the exploration of volatility as a fundamental trait of metabolites. Here, we adapted methods to estimate vapor pressure from the functional group composition of individual molecules (SIMPOL.1) to predict the gas-phase partitioning of compounds in different environments. We implemented these methods in a new open pipeline called volcalc that uses chemoinformatic tools to automate these volatility estimates for all metabolites in an extensive and continuously updated pathway database: the Kyoto Encyclopedia of Genes and Genomes (KEGG) that connects metabolites, organisms, and reactions. We first benchmark the automated pipeline against a manually curated data set and show that the same category of volatility (e.g., nonvolatile, low, moderate, high) is predicted for 93% of compounds. We then demonstrate how volcalc might be used to generate and test hypotheses about the role of volatility in biological systems and organisms. Specifically, we estimate that 3.4 and 26.6% of compounds in KEGG have high volatility depending on the environment (soil vs. clean atmosphere, respectively) and that a core set of volatiles is shared among all domains of life (30%) with the largest proportion of kingdom-specific volatiles identified in bacteria. With volcalc, we lay a foundation for uncovering the role of the volatilome using an approach that is easily integrated with other bioinformatic pipelines and can be continually refined to consider additional dimensions to volatility. The volcalc package is an accessible tool to help design and test hypotheses on volatile metabolites and their unique roles in biological systems.

Keywords: VOCs; bioinformatics; chemoinformatics; metabolic database; volatile metabolite; volatility.

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This material is based upon work primarily supported by the National Science Foundation (NSF) under Grant Nos. 2034192 and 2045332 to LM and by the University of Arizona’s College of Agriculture, Life and Environmental Science (CALES) and the Arizona Experiment Station Data Science Incubator program to LM. This work was additionally supported by the Department of Energy’s Office of Biological and Environmental Research Grant, DE-SC0021349 to MT, and the Arizona University Fellowship to SL.