Machine learning based on structural and FTIR spectroscopic datasets for seed autoclassification

RSC Adv. 2022 Apr 12;12(18):11413-11419. doi: 10.1039/d2ra00239f. eCollection 2022 Apr 7.

Abstract

A single feature set is often unable to effectively classify complex biological samples due to their similar morphology and sizes. This paper proposes a protocol for the fast identification of seed medicinal materials based on micro-structural and infrared spectroscopic characteristics. Three different feature datasets, namely micro-CT, FTIR, and mixed datasets, were established via principal component analysis (PCA) and competitive adaptive reweighted sampling (CARS) and then used to train a back-propagation neural network. The mixed dataset consists of 34-dimensional micro-CT eigenvalues and 13-dimensional FTIR eigenvalues, optimized by PCA and CARS processing and then used to train a BP neural network. The results showed that the classification accuracy reached 89.5% for the micro-CT dataset and 93.3% for the FTIR dataset, and the classification accuracy of the mixed dataset achieved 99.2%, much higher than those of the traditional single feature datasets. This study provides a new protocol for multi-dimensional characteristic architecture with excellent performance for the classification and identification of Chinese medicinal materials.