BSI-MVS: multi-view stereo network with bidirectional semantic information

Ruiming Jia; Jun Yu; Zhenghui Hu; Fei Yuan

doi:10.1038/s41598-024-55612-6

BSI-MVS: multi-view stereo network with bidirectional semantic information

Sci Rep. 2024 Mar 21;14(1):6766. doi: 10.1038/s41598-024-55612-6.

Authors

Ruiming Jia¹, Jun Yu¹, Zhenghui Hu², Fei Yuan³

Affiliations

¹ School of Information Science and Technology, North China University of Technology, Beijing, 100144, China.
² Hangzhou Innovation Institute, Beihang University, Hangzhou, 310051, China.
³ Institute of Information Engineering, Chinese Academy of Sciences, Beijing, 10085, China. yuanfei@iie.ac.cn.

Abstract

The basic principle of multi-view stereo (MVS) is to perform 3D reconstruction by extracting depth information from multiple views. Most current SOTA MVS networks are based on Vision Transformer, which usually means expensive computational complexity. To reduce computational complexity and improve depth map accuracy, we propose a MVS network with Bidirectional Semantic Information (BSI-MVS). Firstly, we design a Multi-Level Spatial Pyramid module to generate multiple layers of feature map for extracting multi-scale information. Then we propose a 2D Bidirectional-LSTM module to capture bidirectional semantic information at different time steps in the horizontal and vertical directions, which contains abundant depth information. Finally, cost volumes are built based on various levels of feature maps to optimize the final depth map. We experiment on the DTU and BlendedMVS datasets. The result shows that our network, in terms of overall metrics, surpasses TransMVSNet, CasMVSNet, CVP-MVSNet, and AACVP-MVSNet respectively by 17.84%, 36.42%, 14.96%, and 4.86%, which also shows a noticeable performance enhancement in objective metrics and visualizations.

Keywords: 3D reconstruction; Bidirectional-LSTM; Multi-view stereo; Transformer.

Abstract

Grants and funding