Validated and predictive processing of gas chromatography-mass spectrometry based metabolomics data for large scale screening studies, diagnostics and metabolite pattern verification

Metabolites. 2012 Oct 31;2(4):796-817. doi: 10.3390/metabo2040796.

Abstract

The suggested approach makes it feasible to screen large metabolomics data, sample sets with retained data quality or to retrieve significant metabolic information from small sample sets that can be verified over multiple studies. Hierarchical multivariate curve resolution (H-MCR), followed by orthogonal partial least squares discriminant analysis (OPLS-DA) was used for processing and classification of gas chromatography/time of flight mass spectrometry (GC/TOFMS) data characterizing human serum samples collected in a study of strenuous physical exercise. The efficiency of predictive H-MCR processing of representative sample subsets, selected by chemometric approaches, for generating high quality data was proven. Extensive model validation by means of cross-validation and external predictions verified the robustness of the extracted metabolite patterns in the data. Comparisons of extracted metabolite patterns between models emphasized the reliability of the methodology in a biological information context. Furthermore, the high predictive power in longitudinal data provided proof for the potential use in clinical diagnosis. Finally, the predictive metabolite pattern was interpreted physiologically, highlighting the biological relevance of the diagnostic pattern.