Bayesian multiple hypotheses testing in compositional analysis of untargeted metabolomic data

Anal Chim Acta. 2020 Feb 8:1097:49-61. doi: 10.1016/j.aca.2019.11.006. Epub 2019 Nov 16.

Abstract

Clinical metabolomics aims at finding statistically significant differences in metabolic statuses of patient and control groups with the intention of understanding pathobiochemical processes and identification of clinically useful biomarkers of particular diseases. After the raw measurements are integrated and pre-processed as intensities of chromatographic peaks, the differences between controls and patients are evaluated by both univariate and multivariate statistical methods. The traditional univariate approach relies on t-tests (or their nonparametric alternatives) and the results from multiple testing are misleadingly compared merely by p-values using the so-called volcano plot. This paper proposes a Bayesian counterpart to the widespread univariate analysis, taking into account the compositional character of a metabolome. Since each metabolome is a collection of some small-molecule metabolites in a biological material, the relative structure of metabolomic data, which is inherently contained in ratios between metabolites, is of the main interest. Therefore, a proper choice of logratio coordinates is an essential step for any statistical analysis of such data. In addition, a concept of b-values is introduced together with a Bayesian version of the volcano plot incorporating distance levels of the posterior highest density intervals from zero. The theoretical background of the contribution is illustrated using two data sets containing samples of patients suffering from 3-hydroxy-3-methylglutaryl-CoA lyase deficiency and medium-chain acyl-CoA dehydrogenase deficiency. To evaluate the stability of the proposed method as well as the benefits of the compositional approach, two simulations designed to mimic a loss of samples and a systematical measurement error, respectively, are added.

Keywords: Bayesian inference; Compositional data; High-dimensional data; Multiple hypotheses testing; Untargeted metabolomics; Volcano plot.

MeSH terms

  • Acetyl-CoA C-Acetyltransferase / deficiency*
  • Acetyl-CoA C-Acetyltransferase / metabolism
  • Acyl-CoA Dehydrogenase / deficiency*
  • Acyl-CoA Dehydrogenase / metabolism
  • Amino Acid Metabolism, Inborn Errors / metabolism*
  • Bayes Theorem*
  • Datasets as Topic
  • Humans
  • Lipid Metabolism, Inborn Errors / metabolism*
  • Metabolomics*

Substances

  • Acyl-CoA Dehydrogenase
  • Acetyl-CoA C-Acetyltransferase

Supplementary concepts

  • 3-Hydroxy-3-Methylglutaryl-CoA Lyase Deficiency
  • Medium chain acyl CoA dehydrogenase deficiency