Self-Supervised Monocular Depth Estimation With Self-Perceptual Anomaly Handling

IEEE Trans Neural Netw Learn Syst. 2023 Aug 15:PP. doi: 10.1109/TNNLS.2023.3301711. Online ahead of print.

Abstract

It is attractive to extract plausible 3-D information from a single 2-D image, and self-supervised learning has shown impressive potential in this field. However, when only monocular videos are available as training data, moving objects at similar speeds to the camera can disturb the reprojection process during training. Existing methods filter out some moving pixels by comparing pixelwise photometric error, but the illumination inconsistency between frames leads to incomplete filtering. In addition, existing methods calculate photometric error within local windows, which leads to the fact that even if an anomalous pixel is masked out, it can still implicitly disturb the reprojection process, as long as it is in the local neighborhood of a nonanomalous pixel. Moreover, the ill-posed nature of monocular depth estimation makes the same scene correspond to multiple plausible depth maps, which damages the robustness of the model. In order to alleviate the above problems, we propose: 1) a self-reprojection mask to further filter out moving objects while avoiding illumination inconsistency; 2) a self-statistical mask method to prevent the filtered anomalous pixels from implicitly disturbing the reprojection; and 3) a self-distillation augmentation consistency loss to reduce the impact of ill-posed nature of monocular depth estimation. Our method shows superior performance on the KITTI dataset, especially when evaluating only the depth of potential moving objects.