Adaptive Spatial Transformation Networks for Periocular Recognition

Sensors (Basel). 2023 Feb 23;23(5):2456. doi: 10.3390/s23052456.

Abstract

Periocular recognition has emerged as a particularly valuable biometric identification method in challenging scenarios, such as partially occluded faces due to COVID-19 protective masks masks, in which face recognition might not be applicable. This work presents a periocular recognition framework based on deep learning, which automatically localises and analyses the most important areas in the periocular region. The main idea is to derive several parallel local branches from a neural network architecture, which in a semi-supervised manner learn the most discriminative areas in the feature map and solve the identification problem solely upon the corresponding cues. Here, each local branch learns a transformation matrix that allows for basic geometrical transformations (cropping and scaling), which is used to select a region of interest in the feature map, further analysed by a set of shared convolutional layers. Finally, the information extracted by the local branches and the main global branch are fused together for recognition. The experiments carried out on the challenging UBIRIS-v2 benchmark show that by integrating the proposed framework with various ResNet architectures, we consistently obtain an improvement in mAP of more than 4% over the "vanilla" architecture. In addition, extensive ablation studies were performed to better understand the behavior of the network and how the spatial transformation and the local branches influence the overall performance of the model. The proposed method can be easily adapted to other computer vision problems, which is also regarded as one of its strengths.

Keywords: attention; biometrics; periocular recognition; spatial transform.

MeSH terms

  • Algorithms
  • Biometric Identification* / methods
  • COVID-19*
  • Face / anatomy & histology
  • Humans
  • Neural Networks, Computer

Grants and funding

The contributions due to Ehsan Yaghoubi and Simone Frintrop were funded by the German Science Foundation (DFG) in the project Crossmodal Learning, TRR 169, and the contributions due to Hugo Proença in this work were funded by FCT/MCTES through national funds and co-funded EU funds under the project UIDB/EEA/50008/2020.