Automated supervised learning pipeline for non-targeted GC-MS data analysis

Anal Chim Acta X. 2019 Jan 10:1:100005. doi: 10.1016/j.acax.2019.100005. eCollection 2019 Mar.

Abstract

Non-targeted analysis is nowadays applied in many different domains of analytical chemistry such as metabolomics, environmental and food analysis. Conventional processing strategies for GC-MS data include baseline correction, feature detection, and retention time alignment before multivariate modeling. These techniques can be prone to errors and therefore time-consuming manual corrections are generally necessary. We introduce here a novel fully automated approach to non-targeted GC-MS data processing. This new approach avoids feature extraction and retention time alignment. Supervised machine learning on decomposed tensors of segmented chromatographic raw data signal is used to rank regions in the chromatograms contributing to differentiation between sample classes. The performance of this novel data analysis approach is demonstrated on three published datasets.

Keywords: Chemometrics; Classification; Exploratory data analysis; Machine learning; Metabolomics; Tensor decomposition.