Advancing Imbalanced Domain Adaptation: Cluster-Level Discrepancy Minimization With a Comprehensive Benchmark

IEEE Trans Cybern. 2023 Feb;53(2):1106-1117. doi: 10.1109/TCYB.2021.3093888. Epub 2023 Jan 13.

Abstract

Unsupervised domain adaptation methods have been proposed to tackle the problem of covariate shift by minimizing the distribution discrepancy between the feature embeddings of source domain and target domain. However, the standard evaluation protocols assume that the conditional label distributions of the two domains are invariant, which is usually not consistent with the real-world scenarios such as long-tailed distribution of visual categories. In this article, the imbalanced domain adaptation (IDA) is formulated for a more realistic scenario where both label shift and covariate shift occur between the two domains. Theoretically, when label shift exists, aligning the marginal distributions may result in negative transfer. Therefore, a novel cluster-level discrepancy minimization (CDM) is developed. CDM proposes cross-domain similarity learning to learn tight and discriminative clusters, which are utilized for both feature-level and distribution-level discrepancy minimization, palliating the negative effect of label shift during domain transfer. Theoretical justifications further demonstrate that CDM minimizes the target risk in a progressive manner. To corroborate the effectiveness of CDM, we propose two evaluation protocols according to the real-world situation and benchmark existing domain adaptation approaches. Extensive experiments demonstrate that negative transfer does occur due to label shift, while our approach achieves significant improvement on imbalanced datasets, including Office-31, Image-CLEF, and Office-Home.