Semantic Guidance Fusion Network for Cross-Modal Semantic Segmentation

Pan Zhang; Ming Chen; Meng Gao

doi:10.3390/s24082473

Semantic Guidance Fusion Network for Cross-Modal Semantic Segmentation

Sensors (Basel). 2024 Apr 12;24(8):2473. doi: 10.3390/s24082473.

Authors

Pan Zhang¹, Ming Chen¹, Meng Gao¹

Affiliation

¹ College of Information, Shanghai Ocean University, No. 999 Hucheng Ring Road, Shanghai 201306, China.

Abstract

Leveraging data from various modalities to enhance multimodal segmentation tasks is a well-regarded approach. Recently, efforts have been made to incorporate an array of modalities, including depth and thermal imaging. Nevertheless, the effective amalgamation of cross-modal interactions remains a challenge, given the unique traits each modality presents. In our current research, we introduce the semantic guidance fusion network (SGFN), which is an innovative cross-modal fusion network adept at integrating a diverse set of modalities. Particularly, the SGFN features a semantic guidance module (SGM) engineered to boost bi-modal feature extraction. It encompasses a learnable semantic guidance convolution (SGC) designed to merge intensity and gradient data from disparate modalities. Comprehensive experiments carried out on the NYU Depth V2, SUN-RGBD, Cityscapes, MFNet, and ZJU datasets underscore both the superior performance and generalization ability of the SGFN compared to the current leading models. Moreover, when tested on the DELIVER dataset, the efficiency of our bi-modal SGFN displayed a mIoU that is comparable to the hitherto leading model, CMNEXT.

Keywords: cross-modal interactions; semantic guidance module; semantic segmentation.

Grants and funding

No. 2021B0202070001/Research and Development Planning in Key Areas of Guang dong Province