Attention Weighted Local Descriptors

IEEE Trans Pattern Anal Mach Intell. 2023 Sep;45(9):10632-10649. doi: 10.1109/TPAMI.2023.3266728. Epub 2023 Aug 7.

Abstract

Local features detection and description are widely used in many vision applications with high industrial and commercial demands. With large-scale applications, these tasks raise high expectations for both the accuracy and speed of local features. Most existing studies on local features learning focus on the local descriptions of individual keypoints, which neglect their relationships established from global spatial awareness. In this paper, we present AWDesc with a consistent attention mechanism (CoAM) that opens up the possibility for local descriptors to embrace image-level spatial awareness in both the training and matching stages. For local features detection, we adopt local features detection with feature pyramid to obtain more stable and accurate keypoints localization. For local features description, we provide two versions of AWDesc to cope with different accuracy and speed requirements. On the one hand, we introduce Context Augmentation to address the inherent locality of convolutional neural networks by injecting non-local context information, so that local descriptors can "look wider to describe better". Specifically, well-designed Adaptive Global Context Augmented Module (AGCA) and Diverse Surrounding Context Augmented Module (DSCA) are proposed to construct robust local descriptors with context information from global to surrounding. On the other hand, we design an extremely lightweight backbone network coupled with the proposed special knowledge distillation strategy to achieve the best trade-off in accuracy and speed. What is more, we perform thorough experiments on image matching, homography estimation, visual localization, and 3D reconstruction tasks, and the results demonstrate that our method surpasses the current state-of-the-art local descriptors. Code is available at: https://github.com/vignywang/AWDesc.