Multispectral image fusion based pedestrian detection using a multilayer fused deconvolutional single-shot detector

Yunfan Chen; Hyunchul Shin

doi:10.1364/JOSAA.386410

Multispectral image fusion based pedestrian detection using a multilayer fused deconvolutional single-shot detector

J Opt Soc Am A Opt Image Sci Vis. 2020 May 1;37(5):768-779. doi: 10.1364/JOSAA.386410.

Authors

Yunfan Chen, Hyunchul Shin

PMID: 32400710
DOI: 10.1364/JOSAA.386410

Abstract

Recent research has demonstrated that effective fusion of multispectral images (visible and thermal images) enables robust pedestrian detection under various illumination conditions (e.g., daytime and nighttime). However, there are some open problems such as poor performance in small-sized pedestrian detection and high computational cost of multispectral information fusion. This paper proposes a multilayer fused deconvolutional single-shot detector that contains a two-stream convolutional module (TCM) and a multilayer fused deconvolutional module (MFDM). The TCM is used to extract convolutional features from multispectral input images. Then fusion blocks are incorporated into the MFDM to combine high-level features with rich semantic information and low-level features with detailed information to generate features with strong a representational power for small pedestrian instances. In addition, we fuse multispectral information at multiple deconvolutional layers in the MFDM via fusion blocks. This multilayer fusion strategy adaptively makes the most use of visible and thermal information. In addition, using fusion blocks for multilayer fusion can reduce the extra computational cost and redundant parameters. Empirical experiments show that the proposed approach achieves an 81.82% average precision (AP) on a new small-sized multispectral pedestrian dataset. The proposed method achieves the best performance on two well-known public multispectral datasets. On the KAIST multispectral pedestrian benchmark, for example, our method achieves a 97.36% AP and a 20 fps detection speed, which outperforms the state-of-the-art published method by 6.82% in AP and is three times faster in its detection speed.