Accurate and Efficient Stereo Matching via Attention Concatenation Volume

Gangwei Xu; Yun Wang; Junda Cheng; Jinhui Tang; Xin Yang

doi:10.1109/TPAMI.2023.3335480

Accurate and Efficient Stereo Matching via Attention Concatenation Volume

IEEE Trans Pattern Anal Mach Intell. 2024 Apr;46(4):2461-2474. doi: 10.1109/TPAMI.2023.3335480. Epub 2024 Mar 6.

Authors

Gangwei Xu, Yun Wang, Junda Cheng, Jinhui Tang, Xin Yang

PMID: 38015702
DOI: 10.1109/TPAMI.2023.3335480

Abstract

Stereo matching is a fundamental building block for many vision and robotics applications. An informative and concise cost volume representation is vital for stereo matching of high accuracy and efficiency. In this article, we present a novel cost volume construction method, named attention concatenation volume (ACV), which generates attention weights from correlation clues to suppress redundant information and enhance matching-related information in the concatenation volume. The ACV can be seamlessly embedded into most stereo matching networks, the resulting networks can use a more lightweight aggregation network and meanwhile achieve higher accuracy. We further design a fast version of ACV to enable real-time performance, named Fast-ACV, which generates high likelihood disparity hypotheses and the corresponding attention weights from low-resolution correlation clues to significantly reduce computational and memory cost and meanwhile maintain a satisfactory accuracy. Furthermore, we design a highly accurate network ACVNet and a real-time network Fast-ACVNet based on our ACV and Fast-ACV respectively, which achieve state-of-the-art performance on several benchmarks.