Towards Lightweight Pixel-Wise Hallucination for Heterogeneous Face Recognition

Chaoyou Fu; Xiaoqiang Zhou; Weizan He; Ran He

doi:10.1109/TPAMI.2022.3227180

Towards Lightweight Pixel-Wise Hallucination for Heterogeneous Face Recognition

IEEE Trans Pattern Anal Mach Intell. 2023 Jul;45(7):9135-9148. doi: 10.1109/TPAMI.2022.3227180. Epub 2023 Jun 5.

Authors

Chaoyou Fu, Xiaoqiang Zhou, Weizan He, Ran He

PMID: 37015576
DOI: 10.1109/TPAMI.2022.3227180

Abstract

Cross-spectral face hallucination is an intuitive way to mitigate the modality discrepancy in Heterogeneous Face Recognition (HFR). However, due to imaging differences, the hallucination inevitably suffers from a shape misalignment between paired heterogeneous images. Rather than building complicated architectures to circumvent the problem like previous works, we propose a simple yet effective method called Shape Alignment FacE (SAFE). Specifically, given an image, we align its shape to that of the paired one under the assistance of a 3D face model. The produced aligned pair enables us to train a lightweight generator that solely concentrates on spectrum translation with a pixel-wise supervision. However, since the 3D face model is powerless to attributes like the hair and glasses, there are still pixel discrepancies between the aligned pair. Given that, in the image space, we introduce a probabilistic pixel-wise loss that incorporates the discrepancies into a probabilistic distribution. Moreover, in order to alleviate the influence of the shape misalignment on spectrum translation, a spectrum optimal transport is performed in a shape-irrelevant latent space. Note that, in the final inference phase, except the lightweight generator, all other auxiliary modules are discarded. In addition to superior performance in qualitative synthesis and quantitative recognition, extensive experiments on 6 datasets demonstrate that our method also gains other two distinct advantages over existing state-of-the-art counterparts. The first is using a more lightweight generator. Compared with the state-of-the-art method, our method can achieve higher recognition results with 128x fewer parameters and 63x fewer FLOPs with only 4.58 ms latency on a single TITAN-XP. The second is training on low-shot datasets such as Oulu-CASIA NIR-VIS that just contains 1,920 images from 20 identities. To the best of our knowledge, we are the first that can perform well on such a small-scale dataset. These advantages make our method more practical in the real world and further push boundaries of heterogeneous face recognition.