Zero-shot learning (ZSL) aims to recognize objects in images when no training data is available for the object classes. Under generalized zero-shot learning (GZSL) setting, the test objects belong to seen or unseen categories. In many recent studies, zero-shot learning is performed by leveraging generative networks to synthesize visual features for unseen class from class-specific semantic features. The user-defined semantic information is incomplete and lack of discriminability. However, most generative methods use user-defined semantic information directly as constraints of the generative model, which makes the visual features synthesized by the models lack of diversity and separability. In this paper, we propose a novel method to improve the semantic feature by utilizing discriminative visual features. Furthermore, a novel Augmented Semantic Feature Based Generative Network (ASFGN) is built to synthesize the separable visual representations for unseen classes. Since GAN-based generative model may suffer from mode collapse, we propose a novel collapse-alleviate loss to improve the training stability and generalization performance of our generative network. Extensive experiments on four benchmark datasets prove that our method outperforms the state-of-art approaches in both ZSL and GZSL settings.
Keywords: Augmented semantic feature; Feature generation; Generalized zero-shot learning; Generative network; Mode collapse.
Copyright © 2021 Elsevier Ltd. All rights reserved.