Novel search algorithms for a mid-infrared spectral library of cotton contaminants

Appl Spectrosc. 2008 Jun;62(6):661-70. doi: 10.1366/000370208784657968.

Abstract

During harvest, a variety of plant based contaminants are collected along with cotton lint. The USDA previously created a mid-infrared, attenuated total reflection (ATR), Fourier transform infrared (FT-IR) spectral library of cotton contaminants for contaminant identification as the contaminants have negative impacts on yarn quality. This library has shown impressive identification rates for extremely similar cellulose based contaminants in cases where the library was representative of the samples searched. When spectra of contaminant samples from crops grown in different geographic locations, seasons, and conditions and measured with a different spectrometer and accessories were searched, identification rates for standard search algorithms decreased significantly. Six standard algorithms were examined: dot product, correlation, sum of absolute values of differences, sum of the square root of the absolute values of differences, sum of absolute values of differences of derivatives, and sum of squared differences of derivatives. Four categories of contaminants derived from cotton plants were considered: leaf, stem, seed coat, and hull. Experiments revealed that the performance of the standard search algorithms depended upon the category of sample being searched and that different algorithms provided complementary information about sample identity. These results indicated that choosing a single standard algorithm to search the library was not possible. Three voting scheme algorithms based on result frequency, result rank, category frequency, or a combination of these factors for the results returned by the standard algorithms were developed and tested for their capability to overcome the unpredictability of the standard algorithms' performances. The group voting scheme search was based on the number of spectra from each category of samples represented in the library returned in the top ten results of the standard algorithms. This group algorithm was able to identify correctly as many test spectra as the best standard algorithm without relying on human choice to select a standard algorithm to perform the searches.

MeSH terms

  • Algorithms*
  • Cotton Fiber / classification*
  • Database Management Systems*
  • Databases, Factual*
  • Environmental Pollutants / analysis*
  • Gossypium / chemistry*
  • Spectroscopy, Fourier Transform Infrared / methods*

Substances

  • Environmental Pollutants