Co-training based virtual sample generation for solving the small sample size problem in process industry

ISA Trans. 2023 Mar:134:290-301. doi: 10.1016/j.isatra.2022.08.021. Epub 2022 Aug 26.

Abstract

With the development of industrialization, the production scale and complexity of process industries are getting larger and larger. But, limited by the small amounts of samples and the uneven sample distribution in the process industry, it is difficult to establish accurate and efficient data-driven soft sensor models to predict some variables. To further develop the application of soft sensor models, generating new virtual samples based on the original sample distribution to extend the sample set is an ideal approach to solve this problem. In this paper, a novel virtual sample generation method based on the co-training of two K-Nearest Neighbor (KNN) models is proposed. First, according to the sparse parameter, sparse regions in each dimension of the feature space are identified. Second, the input features of virtual samples are generated in these sparse regions by performing interpolation operations. Third, the outputs of virtual samples are predicted by double KNN regressors based on co-training. The qualified virtual samples are screened and the model is updated using these virtual samples to improve the prediction accuracy of the double KNN models. To verify the effectiveness and superiority of the proposed virtual sample generation method based on the co-training (CTVSG), case studies are conducted using two standard functions and a Purified Terephthalic Acid (PTA) industrial dataset, where the effectiveness of CTVSG is confirmed.

Keywords: Industrial process; Small sample size; Soft sensor; Virtual sample generation.