Cross-modal distribution alignment embedding network for generalized zero-shot learning

Qin Li; Mingzhen Hou; Hong Lai; Ming Yang

doi:10.1016/j.neunet.2022.01.007

Cross-modal distribution alignment embedding network for generalized zero-shot learning

Neural Netw. 2022 Apr:148:176-182. doi: 10.1016/j.neunet.2022.01.007. Epub 2022 Jan 29.

Authors

Qin Li¹, Mingzhen Hou², Hong Lai³, Ming Yang⁴

Affiliations

¹ School of Software Engineering, Shenzhen Institute of Information Technology, Shenzhen 518172, China.
² State Key Laboratory of Integrated Services Networks, Xidian University, Shaanxi 710071, China.
³ School of Software Engineering, Shenzhen Institute of Information Technology, Shenzhen 518172, China. Electronic address: laih@sziit.edu.cn.
⁴ Departments of Mathematics and Computer & Information Science, Westfield State University, Westfield, MA 01086, United States of America.

PMID: 35144151
DOI: 10.1016/j.neunet.2022.01.007

Abstract

Many approaches in generalized zero-shot learning (GZSL) rely on cross-modal mapping between the image feature space and the class embedding space, which achieves knowledge transfer from seen to unseen classes. However, these two spaces are completely different space and their manifolds are inconsistent, the existing methods suffer from highly overlapped semantic description of different classes, as in GZSL tasks unseen classes can be easily misclassified into seen classes. To handle these problems, we adopt a novel semantic embedding network which helps to encode more discriminative information from initial semantic attributes to semantic embeddings in visual space. Meanwhile, a distribution alignment constraint is adopted to help keep the distribution of the learned semantic embeddings consistent with the distribution of real image features. Moreover, an auxiliary classifier is adopted to strengthen the quality of the learned semantic embeddings. Finally, a relation network is used to classify the unseen images by computing the relation scores between the semantic embeddings and image features, which is much more flexible than the fixed distance metric functions. Experimental results demonstrate that our proposed method is superior to other state-of-the-arts.

Keywords: Generalized zero-shot learning; Image classification; Weakly-supervised learning.

MeSH terms

Knowledge
Learning
Machine Learning*
Semantic Web
Semantics*