Sparse PLS-Based Method for Overlapping Metabolite Set Enrichment Analysis

J Proteome Res. 2021 Jun 4;20(6):3204-3213. doi: 10.1021/acs.jproteome.1c00064. Epub 2021 May 18.

Abstract

Metabolite set enrichment analysis (MSEA) has gained increasing research interest for identification of perturbed metabolic pathways in metabolomics. The method incorporates predefined metabolic pathways information in the analysis where metabolite sets are typically assumed to be mutually exclusive to each other. However, metabolic pathways are known to contain common metabolites and intermediates. This situation, along with limitations in metabolite detection or coverage leads to overlapping, incomplete metabolite sets in pathway analysis. For overlapping metabolite sets, MSEA tends to result in high false positives due to improper weights allocated to the overlapping metabolites. Here, we proposed an extended partial least squares (PLS) model with a new sparse scheme for overlapping metabolite set enrichment analysis, named overlapping group PLS (ogPLS) analysis. The weight vector of the ogPLS model was decomposed into pathway-specific subvectors, and then a group lasso penalty was imposed on these subvectors to achieve a proper weight allocation for the overlapping metabolites. Two strategies were adopted in the proposed ogPLS model to identify the perturbed metabolic pathways. The first strategy involves debiasing regularization, which was used to reduce inequalities amongst the predefined metabolic pathways. The second strategy is stable selection, which was used to rank pathways while avoiding the nuisance problems of model parameter optimization. Both simulated and real-world metabolomic datasets were used to evaluate the proposed method and compare with two other MSEA methods including Global-test and the multiblock PLS (MB-PLS)-based pathway importance in projection (PIP) methods. Using a simulated dataset with known perturbed pathways, the average true discovery rate for the ogPLS method was found to be higher than the Global-test and the MB-PLS-based PIP methods. Analysis with a real-world metabolomics dataset also indicated that the developed method was less prone to select pathways with highly overlapped detected metabolite sets. Compared with the two other methods, the proposed method features higher accuracy, lower false-positive rate, and is more robust when applied to overlapping metabolite set analysis. The developed ogPLS method may serve as an alternative MSEA method to facilitate biological interpretation of metabolomics data for overlapping metabolite sets.

Keywords: debiasing regularization; group lasso; metabolite set enrichment analysis (MSEA); overlapping group partial least squares (ogPLS); stable selection.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Least-Squares Analysis
  • Metabolic Networks and Pathways*
  • Metabolomics*