Semantic-Guided Class-Imbalance Learning Model for Zero-Shot Image Classification

IEEE Trans Cybern. 2022 Jul;52(7):6543-6554. doi: 10.1109/TCYB.2020.3004641. Epub 2022 Jul 4.

Abstract

In this article, we focus on the task of zero-shot image classification (ZSIC) that equips a learning system with the ability to recognize visual images from unseen classes. In contrast to the traditional image classification, ZSIC more easily suffers from the class-imbalance issue since it is more concerned with the class-level knowledge transferring capability. In the real world, the sample numbers of different categories generally follow a long-tailed distribution, and the discriminative information in the sample-scarce seen classes is hard to transfer to the related unseen classes in the traditional batch-based training manner, which degrades the overall generalization ability a lot. To alleviate the class-imbalance issue in ZSIC, we propose a sample-balanced training process to encourage all training classes to contribute equally to the learned model. Specifically, we randomly select the same number of images from each class across all training classes to form a training batch to ensure that the sample-scarce classes contribute equally as those classes with sufficient samples during each iteration. Considering that the instances from the same class differ in class representativeness, we further develop an efficient semantic-guided feature fusion model to obtain the discriminative class visual prototype for the following visual-semantic interaction process via distributing different weights to the selected samples based on their class representativeness. Extensive experiments on three imbalanced ZSIC benchmark datasets for both traditional ZSIC and generalized ZSIC tasks demonstrate that our approach achieves promising results, especially for the unseen categories that are closely related to the sample-scarce seen categories. Besides, the experimental results on two class-balanced datasets show that the proposed approach also improves the classification performance against the baseline model.