A factorization method for the classification of infrared spectra

BMC Bioinformatics. 2010 Nov 15:11:561. doi: 10.1186/1471-2105-11-561.

Abstract

Background: Bioinformatics data analysis often deals with additive mixtures of signals for which only class labels are known. Then, the overall goal is to estimate class related signals for data mining purposes. A convenient application is metabolic monitoring of patients using infrared spectroscopy. Within an infrared spectrum each single compound contributes quantitatively to the measurement.

Results: In this work, we propose a novel factorization technique for additive signal factorization that allows learning from classified samples. We define a composed loss function for this task and analytically derive a closed form equation such that training a model reduces to searching for an optimal threshold vector. Our experiments, carried out on synthetic and clinical data, show a sensitivity of up to 0.958 and specificity of up to 0.841 for a 15-class problem of disease classification. Using class and regression information in parallel, our algorithm outperforms linear SVM for training cases having many classes and few data.

Conclusions: The presented factorization method provides a simple and generative model and, therefore, represents a first step towards predictive factorization methods.

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Databases, Factual
  • Models, Theoretical
  • Sensitivity and Specificity
  • Signal Processing, Computer-Assisted*
  • Spectrophotometry, Infrared / methods*