EDPNet: An Encoding-Decoding Network with Pyramidal Representation for Semantic Image Segmentation

Dong Chen; Xianghong Li; Fan Hu; P Takis Mathiopoulos; Shaoning Di; Mingming Sui; Jiju Peethambaran

doi:10.3390/s23063205

EDPNet: An Encoding-Decoding Network with Pyramidal Representation for Semantic Image Segmentation

Sensors (Basel). 2023 Mar 17;23(6):3205. doi: 10.3390/s23063205.

Authors

Dong Chen¹, Xianghong Li¹, Fan Hu¹, P Takis Mathiopoulos², Shaoning Di³, Mingming Sui¹, Jiju Peethambaran⁴

Affiliations

¹ College of Civil Engineering, Nanjing Forestry University, Nanjing 210037, China.
² Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, 15784 Athens, Greece.
³ School of Geosciences and Info Physics, Central South University, Changsha 410083, China.
⁴ Department of Mathematics and Computing Science, Saint Mary's University, Halifax, NS B3P 2M6, Canada.

Abstract

This paper proposes an encoding-decoding network with a pyramidal representation module, which will be referred to as EDPNet, and is designed for efficient semantic image segmentation. On the one hand, during the encoding process of the proposed EDPNet, the enhancement of the Xception network, i.e., Xception+ is employed as a backbone to learn the discriminative feature maps. The obtained discriminative features are then fed into the pyramidal representation module, from which the context-augmented features are learned and optimized by leveraging a multi-level feature representation and aggregation process. On the other hand, during the image restoration decoding process, the encoded semantic-rich features are progressively recovered with the assistance of a simplified skip connection mechanism, which performs channel concatenation between high-level encoded features with rich semantic information and low-level features with spatial detail information. The proposed hybrid representation employing the proposed encoding-decoding and pyramidal structures has a global-aware perception and captures fine-grained contours of various geographical objects very well with high computational efficiency. The performance of the proposed EDPNet has been compared against PSPNet, DeepLabv3, and U-Net, employing four benchmark datasets, namely eTRIMS, Cityscapes, PASCAL VOC2012, and CamVid. EDPNet acquired the highest accuracy of 83.6% and 73.8% mIoUs on eTRIMS and PASCAL VOC2012 datasets, while its accuracy on the other two datasets was comparable to that of PSPNet, DeepLabv3, and U-Net models. EDPNet achieved the highest efficiency among the compared models on all datasets.

Keywords: convolution neural network; encoder–decoder network; pyramidal representation; semantic parsing; semantic segmentation.

Abstract

Grants and funding