Data Pre-Processing of Infrared Spectral Breathprints for Lung Cancer Detection

Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov:2021:1353-1357. doi: 10.1109/EMBC46164.2021.9629690.

Abstract

Though breath analysis shows promise as a noninvasive and cost-effective approach to lung cancer screening, biomarkers in exhaled breath samples can be overwhelmed by irrelevant internal and environmental volatile organic compounds (VOCs). These extraneous VOCs can obscure the disease signature in a spectral breathprint, hindering the performance of pattern recognition models. In this work, pre-processing pipelines consisting of missing value replacement, detrending, and normalization techniques were evaluated to reduce these effects and enhance the features of interest in infrared cavity ring-down spectra. The best performing pipeline consisted of moving average detrending, linear interpolation for missing values, and vector normalization. This model achieved an average accuracy of 73.04% across five types of classifiers, exhibiting an 8.36% improvement compared to a baseline model (p < 0.05). A linear support vector machine classifier yielded the best performance (79.75% accuracy, 67.74% sensitivity, 87.50% specificity). This work can serve to guide pre-processing in future lung cancer breath research and, more broadly, in infrared laser absorption spectroscopy in general.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Breath Tests
  • Early Detection of Cancer
  • Exhalation
  • Humans
  • Lung Neoplasms* / diagnosis
  • Volatile Organic Compounds*

Substances

  • Volatile Organic Compounds