To Estimate Performance of Artificial Neural Network Model Based on Terahertz Spectrum: Gelatin Identification as an Example

Front Nutr. 2022 Jul 14:9:925717. doi: 10.3389/fnut.2022.925717. eCollection 2022.

Abstract

It is a necessity to determine significant food or traditional Chinese medicine (TCM) with low cost, which is more likely to achieve high accurate identification by THz-TDS. In this study, feedforward neural networks based on terahertz spectra are employed to predict the animal origin of gelatins, whose adaption to the mission is examined by parallel models built by random sample partition and initialization. It is found that the generalization performance of feedforward ANNs in original data is not satisfactory although prediction on trained samples can be accurate. A multivariate scattering correction is conducted to enhance prediction accuracy, and 20 additional models verify the effectiveness of such dispose. A special partition of total dataset is conducted based on statistics of parallel models, whose influence on ANN performance is investigated with another 20 models. The performance of the models is unsatisfactory because of notable differences in training and test sets according to principal component analysis. By comparing the distribution of the first two principal components before and after multivariate scattering correction, we found that the reciprocal of the minimum number of line segments required for error-free classification in 2-D feature space can be viewed as an index to describe linear separability of data. The rise of proposed linear separability would have a lower requirement for harsh parameter tuning of ANN models and tolerate random initialization. The difference in principal components of samples between a training set and a data set determines whether partition is acceptable or whether a model would have generality. A rapid way to estimate the performance of an ANN before sufficient tuning on a classification mission is to compare differences between groups and differences within groups. Given that a representative peak missing curve is discussed in this article, an analysis based on gelatin THz spectra may be helpful for studies on some other feature-less species.

Keywords: artificial neural net (ANN); gelatin; identification; multivariate scattering correction; principal component analysis; terahertz time domain spectroscopy (THz-TDS).