Event-based Semantic Segmentation with Posterior Attention

IEEE Trans Image Process. 2023 Mar 3:PP. doi: 10.1109/TIP.2023.3249579. Online ahead of print.

Abstract

In the past years, attention-based Transformers have swept across the field of computer vision, starting a new stage of backbones in semantic segmentation. Nevertheless, semantic segmentation under poor light conditions remains an open problem. Moreover, most papers about semantic segmentation work on images produced by commodity frame-based cameras with a limited framerate, hindering their deployment to auto-driving systems that require instant perception and response at milliseconds. An event camera is a new sensor that generates event data at microseconds and can work in poor light conditions with a high dynamic range. It looks promising to leverage event cameras to enable perception where commodity cameras are incompetent, but algorithms for event data are far from mature. Pioneering researchers stack event data as frames so that event-based segmentation is converted to framebased segmentation, but characteristics of event data are not explored. Noticing that event data naturally highlight moving objects, we propose a posterior attention module that adjusts the standard attention by the prior knowledge provided by event data. The posterior attention module can be readily plugged into many segmentation backbones. Plugging the posterior attention module into a recently proposed SegFormer network, we get EvSegFormer (the event-based version of SegFormer) with state-of-the-art performance in two datasets (MVSEC and DDD-17) collected for event-based segmentation. Code is available at https://github.com/zexiJia/EvSegFormer to facilitate research on event-based vision.