Improving prediction model robustness with virtual sample construction for near-infrared spectra analysis

Anal Chim Acta. 2023 Oct 23:1279:341763. doi: 10.1016/j.aca.2023.341763. Epub 2023 Sep 4.

Abstract

In a qualitative analysis of near-infrared spectroscopy (NIRS), when the samples to be analyzed are difficult to obtain or there are few counterexamples, the robustness of the models is poor, resulting in the decline of the generalization ability of the models. In this case, the effective method is to construct virtual samples to achieve the balance of categories. In this contribution, three virtual spectrum construction strategies including Synthetic Minority Oversampling Technique (SMOTE), Adaptive Synthetic Sampling (ADASYN), and Deep Convolutional Generative Adversarial Network (DCGAN) were explored to deal with the problem of insufficient or imbalanced sample numbers in NIRS analysis. The strategies were tested with the melamine and Yali pears two spectral datasets. The PLS-DA and Correct Recognition Rate (CRR) were used for discriminant model construction and accuracy evaluation, respectively. The results show that SMOTE, ADASYN, and DCGAN processing strategies can all improve the global CRR (CRRglob). The SMOTE and ADASYN can improve the CRR for majority class sample (CRRmaj), but the CRR for minority class sample (CRRmin) has decreased. For the DCGAN method, the CRRglob, CRRmaj, and CRRmin were all improved. The standard deviation of the results of the multiple parallel calculations demonstrates the robustness of DCGAN generation method. Therefore, the DCGAN method has good reliability and practicability, and can increase the robustness and generalization ability of the NIRS model.

Keywords: Deep convolutional generative adversarial network; Near-infrared spectroscopy; Spectral dataset balancing; Virtual sample construction.