Unified DeepLabV3+ for Semi-Dark Image Semantic Segmentation

Mehak Maqbool Memon; Manzoor Ahmed Hashmani; Aisha Zahid Junejo; Syed Sajjad Rizvi; Kamran Raza

doi:10.3390/s22145312

Unified DeepLabV3+ for Semi-Dark Image Semantic Segmentation

Sensors (Basel). 2022 Jul 15;22(14):5312. doi: 10.3390/s22145312.

Authors

Mehak Maqbool Memon¹, Manzoor Ahmed Hashmani¹, Aisha Zahid Junejo¹, Syed Sajjad Rizvi², Kamran Raza³

Affiliations

¹ High Performance Cloud Computing Center (HPC3), Department of Computer and Information Sciences, Universiti Teknologi PETRONAS, Seri Iskandar 32610, Malaysia.
² Department of Computer Science, Shaheed Zulfiqar Ali Bhutto Institute of Science and Technology, Karachi 75600, Pakistan.
³ Faculty of Engineering Science and Technology, Iqra University, Karachi 75600, Pakistan.

Abstract

Semantic segmentation for accurate visual perception is a critical task in computer vision. In principle, the automatic classification of dynamic visual scenes using predefined object classes remains unresolved. The challenging problems of learning deep convolution neural networks, specifically ResNet-based DeepLabV3+ (the most recent version), are threefold. The problems arise due to (1) biased centric exploitations of filter masks, (2) lower representational power of residual networks due to identity shortcuts, and (3) a loss of spatial relationship by using per-pixel primitives. To solve these problems, we present a proficient approach based on DeepLabV3+, along with an added evaluation metric, namely, Unified DeepLabV3+ and S3core, respectively. The presented unified version reduced the effect of biased exploitations via additional dilated convolution layers with customized dilation rates. We further tackled the problem of representational power by introducing non-linear group normalization shortcuts to solve the focused problem of semi-dark images. Meanwhile, to keep track of the spatial relationships in terms of the global and local contexts, geometrically bunched pixel cues were used. We accumulated all the proposed variants of DeepLabV3+ to propose Unified DeepLabV3+ for accurate visual decisions. Finally, the proposed S3core evaluation metric was based on the weighted combination of three different accuracy measures, i.e., the pixel accuracy, IoU (intersection over union), and Mean BFScore, as robust identification criteria. Extensive experimental analysis performed over a CamVid dataset confirmed the applicability of the proposed solution for autonomous vehicles and robotics for outdoor settings. The experimental analysis showed that the proposed Unified DeepLabV3+ outperformed DeepLabV3+ by a margin of 3% in terms of the class-wise pixel accuracy, along with a higher S3core, depicting the effectiveness of the proposed approach.

Keywords: atrous convolutions; high-resolution images; semantic segmentation; super-pixels; urban environments.

MeSH terms

Image Processing, Computer-Assisted* / methods
Neural Networks, Computer
Semantics*

Grants and funding

015MEO-227/Iqra University, Pakistan, and Universiti Teknologi PETRONAS (UTP), Malaysia.