Combining local wavelength information and ensemble learning to enhance the specificity of class modeling techniques: Identification of food geographical origins and adulteration

Anal Chim Acta. 2012 Nov 19:754:31-8. doi: 10.1016/j.aca.2012.10.011. Epub 2012 Oct 12.

Abstract

Class modeling techniques are required to tackle various one-class problems. Because the training of class models is based on the target class and the origins of future test objects usually cannot be exactly predefined, the criteria for feature selection of class models are not very straightforward. Although feature reduction can be expected to improve class models performance, more features retained can provide a sufficient description of the sought-for class. This paper suggests a strategy to balance class description and model specificity by ensemble learning of sub-models based on separate local wavelength intervals. The acceptance or rejection of a future object can be explicitly determined by examining its acceptance frequency by sub-models. Considering the lack of information about sub-model independence, we propose to use a data-driven method to control the sensitivity of the ensemble model by cross validation. In this way, all the wavelength intervals are used for class description and the local wavelength intervals are highlighted to enhance the ability to detect out-of-class objects. The proposed strategy was performed on one-class partial least squares (OCPLS) and soft independent modeling of class analogy (SIMCA). By analysis of two infrared spectral data sets, one for geographical origin identification of white tea and the other for discrimination of adulterations in pure sesame oil, the proposed ensemble class modeling method was demonstrated to have similar sensitivity and better specificity compared with total-spectrum SIMCA and OCPLS models. The results indicate local spectral information can be extracted to enhance class model specificity.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Food Contamination / analysis*
  • Food*
  • Least-Squares Analysis
  • Models, Statistical