Data augmentation with norm-AE and selective pseudo-labelling for unsupervised domain adaptation

Qian Wang; Fanlin Meng; Toby P Breckon

doi:10.1016/j.neunet.2023.02.006

Data augmentation with norm-AE and selective pseudo-labelling for unsupervised domain adaptation

Neural Netw. 2023 Apr:161:614-625. doi: 10.1016/j.neunet.2023.02.006. Epub 2023 Feb 11.

Authors

Qian Wang¹, Fanlin Meng², Toby P Breckon³

Affiliations

¹ Department of Computer Science, Durham University, UK. Electronic address: qian.wang173@hotmail.com.
² Alliance Manchester Business School, University of Manchester, UK.
³ Department of Computer Science, Durham University, UK; Department of Engineering, Durham University, UK.

PMID: 36827959
DOI: 10.1016/j.neunet.2023.02.006

Abstract

We address the Unsupervised Domain Adaptation (UDA) problem in image classification from a new perspective. In contrast to most existing works which either align the data distributions or learn domain-invariant features, we directly learn a unified classifier for both the source and target domains in the high-dimensional homogeneous feature space without explicit domain alignment. To this end, we employ the effective Selective Pseudo-Labelling (SPL) technique to take advantage of the unlabelled samples in the target domain. Surprisingly, data distribution discrepancy across the source and target domains can be well handled by a computationally simple classifier (e.g., a shallow Multi-Layer Perceptron) trained in the original feature space. Besides, we propose a novel generative model norm-AE to generate synthetic features for the target domain as a data augmentation strategy to enhance the classifier training. Experimental results on several benchmark datasets demonstrate the pseudo-labelling strategy itself can lead to comparable performance to many state-of-the-art methods whilst the use of norm-AE for feature augmentation can further improve the performance in most cases. As a result, our proposed methods (i.e. naive-SPL and norm-AE-SPL) can achieve comparable performance with state-of-the-art methods with the average accuracy of 93.4% and 90.4% on Office-Caltech and ImageCLEF-DA datasets, and achieve competitive performance on Digits, Office31 and Office-Home datasets with the average accuracy of 97.2%, 87.6% and 68.6% respectively.

Keywords: Data augmentation; Selective Pseudo-Labelling; Unsupervised Domain Adaptation; Variational autoencoder.

MeSH terms

Benchmarking*
Learning*
Neural Networks, Computer