Data size reduction strategy for the classification of breath and air samples using multicapillary column-ion mobility spectrometry

Anal Chem. 2015 Jan 20;87(2):869-75. doi: 10.1021/ac503857y. Epub 2015 Jan 8.

Abstract

Ion mobility spectrometry combined with multicapillary column separation (MCC-IMS) is a well-known technology for detecting volatile organic compounds (VOCs) in gaseous samples. Due to their large data size, processing of MCC-IMS spectra is still the main bottleneck of data analysis, and there is an increasing need for data analysis strategies in which the size of MCC-IMS data is reduced to enable further analysis. In our study, the first untargeted chemometric strategy is developed and employed in the analysis of MCC-IMS spectra from 264 breath and ambient air samples. This strategy does not comprise identification of compounds as a primary step but includes several preprocessing steps and a discriminant analysis. Data size is significantly reduced in three steps. Wavelet transform, mask construction, and sparse-partial least squares-discriminant analysis (s-PLS-DA) allow data size reduction with down to 50 variables relevant to the goal of analysis. The influence and compatibility of the data reduction tools are studied by applying different settings of the developed strategy. Loss of information after preprocessing is evaluated, e.g., by comparing the performance of classification models for different classes of samples. Finally, the interpretability of the classification models is evaluated, and regions of spectra that are related to the identification of potential analytical biomarkers are successfully determined. This work will greatly enable the standardization of analytical procedures across different instrumentation types promoting the adoption of MCC-IMS technology in a wide range of diverse application fields.