Structural Similarity Loss for Learning to Fuse Multi-Focus Images

Xiang Yan; Syed Zulqarnain Gilani; Hanlin Qin; Ajmal Mian

doi:10.3390/s20226647

Structural Similarity Loss for Learning to Fuse Multi-Focus Images

Sensors (Basel). 2020 Nov 20;20(22):6647. doi: 10.3390/s20226647.

Authors

Xiang Yan¹, Syed Zulqarnain Gilani², Hanlin Qin¹, Ajmal Mian³

Affiliations

¹ School of Physics and Optoelectronic Engineering, Xidian University, Xi'an 710071, China.
² School of Science, Edith Cowan University, Joondalup, WA 6027, Australia.
³ Computer Science and Software Engineering, The University of Western Australia, Crawley, WA 6009, Australia.

Abstract

Convolutional neural networks have recently been used for multi-focus image fusion. However, some existing methods have resorted to adding Gaussian blur to focused images, to simulate defocus, thereby generating data (with ground-truth) for supervised learning. Moreover, they classify pixels as 'focused' or 'defocused', and use the classified results to construct the fusion weight maps. This then necessitates a series of post-processing steps. In this paper, we present an end-to-end learning approach for directly predicting the fully focused output image from multi-focus input image pairs. The suggested approach uses a CNN architecture trained to perform fusion, without the need for ground truth fused images. The CNN exploits the image structural similarity (SSIM) to calculate the loss, a metric that is widely accepted for fused image quality evaluation. What is more, we also use the standard deviation of a local window of the image to automatically estimate the importance of the source images in the final fused image when designing the loss function. Our network can accept images of variable sizes and hence, we are able to utilize real benchmark datasets, instead of simulated ones, to train our network. The model is a feed-forward, fully convolutional neural network that can process images of variable sizes during test time. Extensive evaluation on benchmark datasets show that our method outperforms, or is comparable with, existing state-of-the-art techniques on both objective and subjective benchmarks.

Keywords: convolution neural network; multi-focus image fusion; structural similarity; unsupervised learning.

Grants and funding

61901330/National Natural Science Foundation of China