Analysis and comparison of machine learning methods for species identification utilizing ATR-FTIR spectroscopy

Spectrochim Acta A Mol Biomol Spectrosc. 2024 Mar 5:308:123713. doi: 10.1016/j.saa.2023.123713. Epub 2023 Dec 2.

Abstract

Accurate identification of insect species holds paramount significance in diverse fields as it facilitates a comprehensive understanding of their ecological habits, distribution range, and impact on both the environment and humans. While morphological characteristics have traditionally been employed for species identification, the utilization of empty pupariums for this purpose remains relatively limited. In this study, ATR-FTIR was employed to acquire spectral information from empty pupariums of five fly species, subjecting the data to spectral pre-processing to obtain average spectra for preliminary analysis. Subsequently, PCA and OPLS-DA were utilized for clustering and classification. Notably, two wavebands (3000-2800 cm-1 and 1800-1300 cm-1) were found to be significant in distinguishing A. grahami. Further, we established three machine learning models, including SVM, KNN, and RF, to analyze spectra from different waveband groups. The biological fingerprint region (1800-1300 cm-1) demonstrated a substantial advantage in identifying empty puparium species. Remarkably, the SVM model exhibited an impressive accuracy of 100 % in identifying all five fly species. This study represents the first instance of employing infrared spectroscopy and machine learning methods for identifying insect species using empty pupariums, providing a robust research foundation for future investigations in this area.

Keywords: Biological fingerprint region; Empty puparium; Fourier transform infrared; Machine learning; Species identification.

MeSH terms

  • Ataxia Telangiectasia Mutated Proteins
  • Humans
  • Machine Learning*
  • Spectrophotometry, Infrared
  • Spectroscopy, Fourier Transform Infrared / methods

Substances

  • ATR protein, human
  • Ataxia Telangiectasia Mutated Proteins