Binocular stereo matching of real scenes based on a convolutional neural network and computer graphics

Opt Express. 2021 Aug 16;29(17):26876-26893. doi: 10.1364/OE.433247.

Abstract

The binocular stereo matching method based on deep learning has limited cross-domain generalization ability, and it is a tricky problem to obtain a large amount of data from real scenes for training. The most advanced stereo matching network is difficult to apply to new real scenes. In this paper, we propose a real-scene stereo matching method based on a convolutional neural network and computer graphics. A virtual binocular imaging system is constructed by introducing graphics software, and a high-quality semi-synthetic dataset close to the texture characteristics of the real scene is constructed for training the network. A feature standardization layer is embedded in the feature extraction module of the proposed network to further reduce the feature space difference between semi-synthetic data and real scene data. Three small 4D cost volumes are constructed to replace one large 4D cost volume, which reduces GPU memory consumption and improves the matching performance of the network. The experimental results show that compared with the traditional stereo matching method, the matching accuracy of the proposed method is significantly improved by about 60%. Compared with other learning-based methods, the matching accuracy is increased by about 30%, the matching speed is increased by 38%, and it has good robustness to the interference of defocus blur and Gaussian noise.