Multivariate strategy for the sample selection and integration of multi-batch data in metabolomics

Metabolomics. 2017;13(10):114. doi: 10.1007/s11306-017-1248-1. Epub 2017 Aug 24.

Abstract

Introduction: Availability of large cohorts of samples with related metadata provides scientists with extensive material for studies. At the same time, recent development of modern high-throughput 'omics' technologies, including metabolomics, has resulted in the potential for analysis of large sample sizes. Representative subset selection becomes critical for selection of samples from bigger cohorts and their division into analytical batches. This especially holds true when relative quantification of compound levels is used.

Objectives: We present a multivariate strategy for representative sample selection and integration of results from multi-batch experiments in metabolomics.

Methods: Multivariate characterization was applied for design of experiment based sample selection and subsequent subdivision into four analytical batches which were analyzed on different days by metabolomics profiling using gas-chromatography time-of-flight mass spectrometry (GC-TOF-MS). For each batch OPLS-DA® was used and its p(corr) vectors were averaged to obtain combined metabolic profile. Jackknifed standard errors were used to calculate confidence intervals for each metabolite in the average p(corr) profile.

Results: A combined, representative metabolic profile describing differences between systemic lupus erythematosus (SLE) patients and controls was obtained and used for elucidation of metabolic pathways that could be disturbed in SLE.

Conclusion: Design of experiment based representative sample selection ensured diversity and minimized bias that could be introduced at this step. Combined metabolic profile enabled unified analysis and interpretation.

Keywords: Metabolomics; Multi-batch analysis; OPLS; Representative sample selection.