CrossHomo: Cross-Modality and Cross-Resolution Homography Estimation

IEEE Trans Pattern Anal Mach Intell. 2024 Feb 15:PP. doi: 10.1109/TPAMI.2024.3366234. Online ahead of print.

Abstract

Multi-modal homography estimation aims to spatially align the images from different modalities, which is quite challenging since both the image content and resolution are variant across modalities. In this paper, we introduce a novel framework namely CrossHomo to tackle this challenging problem. Our framework is motivated by two interesting findings which demonstrate the mutual benefits between image super-resolution and homography estimation. Based on these findings, we design a flexible multi-level homography estimation network to align the multi-modal images in a coarse-to-fine manner. Each level is composed of a multi-modal image super-resolution (MISR) module to shrink the resolution gap between different modalities, followed by a multi-modal homography estimation (MHE) module to predict the homography matrix. To the best of our knowledge, CrossHomo is the first attempt to address the homography estimation problem with both modality and resolution discrepancy. Extensive experimental results show that our CrossHomo can achieve high registration accuracy on various multi-modal datasets with different resolution gaps. In addition, the network has high efficiency in terms of both model complexity and running speed. The source codes are available at https://github.com/lep990816/CrossHomo.