Image Segmentation Using Encoder-Decoder with Deformable Convolutions

Sensors (Basel). 2021 Feb 24;21(5):1570. doi: 10.3390/s21051570.

Abstract

Image segmentation is an essential step in image analysis that brings meaning to the pixels in the image. Nevertheless, it is also a difficult task due to the lack of a general suited approach to this problem and the use of real-life pictures that can suffer from noise or object obstruction. This paper proposes an architecture for semantic segmentation using a convolutional neural network based on the Xception model, which was previously used for classification. Different experiments were made in order to find the best performances of the model (eg. different resolution and depth of the network and data augmentation techniques were applied). Additionally, the network was improved by adding a deformable convolution module. The proposed architecture obtained a 76.8 mean IoU on the Pascal VOC 2012 dataset and 58.1 on the Cityscapes dataset. It outperforms SegNet and U-Net networks, both networks having considerably more parameters and also a higher inference time.

Keywords: Xception model; convolutional neural network; deformable convolutions; image segmentation; mean intersection over union.