Part-Object Progressive Refinement Network for Zero-Shot Learning

IEEE Trans Image Process. 2024:33:2032-2043. doi: 10.1109/TIP.2024.3374217. Epub 2024 Mar 18.

Abstract

Zero-shot learning (ZSL) recognizes unseen images by sharing semantic knowledge transferred from seen images, encouraging the investigation of associations between semantic and visual information. Prior works have been devoted to the alignment of global visual features with semantic information, i.e., attribute vectors, or further mining the local part regions related to each attribute and then simply concatenating them for category decisions. Although effective, these works ignore intrinsic interactions between local parts and the whole object, which enables a more discriminative and representative knowledge transfer for ZSL. In this paper, we propose a Part-Object Progressive Refinement Network (POPRNet), where discriminative and transferable semantics are progressively refined by the cooperation between parts and the whole object. Specifically, POPRNet incorporates discriminative part semantics and object-centric semantics guided by semantic intensity to improve cross-domain transferability. To achieve part-object learning, a semantic-augment transformer (SaT) is proposed to model the part-object relation at the part-level via an encoder and at the object-level via a decoder, generating a comprehensive semantic representation to boost discriminability and transferability. By introducing the prototype updating module embedded with the prototype selection layers, the discriminative ability of the updated category prototype is enhanced to further improve the recognition performance of ZSL. Extensive experiments are conducted to demonstrate the superiority and competitiveness of our proposed POPRNet method on three public benchmark datasets. The code is available at https://github.com/ManLiuCoder/POPRNet.