Refined UNet v3: Efficient end-to-end patch-wise network for cloud and shadow segmentation with multi-channel spectral features

Libin Jiao; Lianzhi Huo; Changmiao Hu; Ping Tang

doi:10.1016/j.neunet.2021.08.008

Refined UNet v3: Efficient end-to-end patch-wise network for cloud and shadow segmentation with multi-channel spectral features

Neural Netw. 2021 Nov:143:767-782. doi: 10.1016/j.neunet.2021.08.008. Epub 2021 Aug 20.

Authors

Libin Jiao¹, Lianzhi Huo², Changmiao Hu³, Ping Tang⁴

Affiliations

¹ Aerospace Information Research Institute (AIR), Chinese Academy of Sciences (CAS), Beijing 100101, China. Electronic address: jiaolb@aircas.ac.cn.
² Aerospace Information Research Institute (AIR), Chinese Academy of Sciences (CAS), Beijing 100101, China. Electronic address: huolz@aircas.ac.cn.
³ Aerospace Information Research Institute (AIR), Chinese Academy of Sciences (CAS), Beijing 100101, China. Electronic address: hucm@aircas.ac.cn.
⁴ Aerospace Information Research Institute (AIR), Chinese Academy of Sciences (CAS), Beijing 100101, China. Electronic address: tangping@aircas.ac.cn.

PMID: 34488013
DOI: 10.1016/j.neunet.2021.08.008

Abstract

Semantic segmentation is one of the essential prerequisites for computer vision tasks, but edge-precise segmentation stays challenging due to the potential lack of a proper model indicating the low-level relation between pixels. We have presented Refined UNet v2, a concatenation of a network backbone and a subsequent embedded conditional random field (CRF) layer, which coarsely performs pixel-wise classification and refines edges of segmentation regions in a one-stage way. However, the CRF layer of v2 employs a gray-scale global observation (image) to construct contrast-sensitive bilateral features, which is not able to achieve the desired performance on ambiguous edges. In addition, the naïve depth-wise Gaussian filter cannot always compute efficiently, especially for a longer-range message-passing step. To address the aforementioned issues, we upgrade the bilateral message-passing kernel and the efficient implementation of Gaussian filtering in the CRF layer in this paper, referred to as Refined UNet v3, which is able to effectively capture ambiguous edges and accelerate the message-passing procedure. Specifically, the inherited UNet is employed to coarsely locate cloud and shadow regions and the embedded CRF layer refines the edges of the forthcoming segmentation proposals. The multi-channel guided Gaussian filter is applied to the bilateral message-passing step, which improves detecting ambiguous edges that are hard for the gray-scale counterpart to identify, and fast Fourier transform-based (FFT-based) Gaussian filtering facilitates an efficient and potentially range-agnostic implementation. Furthermore, Refined UNet v3 is able to be extended to segmentation on multi-spectral datasets, and the corresponding refinement examination confirms the development of shadow retrieval. Experiments and corresponding results demonstrate that the proposed update can outperform its counterpart in terms of the detection of vague edges, shadow retrieval, and isolated redundant regions, and it is practically efficient in our TensorFlow implementation. The demo source code is available at https://github.com/92xianshen/refined-unet-v3.

Keywords: Conditional random fields; Efficient implementation; Neural network; Semantic segmentation.

MeSH terms

Image Processing, Computer-Assisted*
Neural Networks, Computer*