CSR-Net: Learning Adaptive Context Structure Representation for Robust Feature Correspondence

IEEE Trans Image Process. 2022:31:3197-3210. doi: 10.1109/TIP.2022.3166284. Epub 2022 Apr 21.

Abstract

Feature matching, which refers to identifying and then corresponding the same or similar visual pattern from two or more images, is a key technique in any image processing task that requires establishing good correspondences between images. Given potential correspondences (matches) in two scenes, a novel whole-part deep learning framework, termed as Context Structure Representation Network (CSR-Net), is designed to infer the probabilities of arbitrary correspondences being inliers. Traditional approaches commonly build the local relation between correspondences by manually engineered criteria. Different from existing attempts, the main idea of our work is to learn explicitly neighborhood structure of each correspondence, allowing us to formulate the matching problem into a dynamic local structure consensus evaluation in an end-to-end fashion. For this purpose, we propose a permutation-invariant STructure Representation (STR) learning module, which can easily merge different types of networks into a unified architecture to deal with sparse matches directly. By the collaborative use of STR, we introduce a Context-Aware Attention (CAA) mechanism to adaptively re-calibrate structure features via a rotation-invariant context aware encoding and simple feature gating, thus arising the ability of fine-grained patterns recognition. Moreover, to further weaken the cost of establishing reliable correspondences, the CSR-Net is formulated as whole-part consensus learning, where the aim of whole level is compensating rigid transformations. In order to demonstrate our CSR-Net can effectively boost the baselines, we intensively experiment on image matching and other visual tasks. The results of the experiment confirm that the matching performances of CSR-Net have significantly improved over nine state-of-the-art competitors.