A self-supervised network-based smoke removal and depth estimation for monocular endoscopic videos

IEEE Trans Vis Comput Graph. 2023 Dec 28:PP. doi: 10.1109/TVCG.2023.3347438. Online ahead of print.

Abstract

In minimally invasive surgery videos, label-free monocular laparoscopic depth estimation is challenging due to smoke. For this reason, we propose a self-supervised collaborative network-based depth estimation method with smoke-removal for monocular endoscopic video, which is decomposed into two steps of smoke-removal and depth estimation. In the first step, we develop a de-endoscopic smoke for cyclic GAN (DS-cGAN) to mitigate the smoke components at different concentrations. The designed generator network comprises sharpened guide encoding module (SGEM), residual dense bottleneck module (RDBM) and refined upsampling convolution module (RUCM), which restores more detailed organ edges and tissue structures. In the second step, high resolution residual U-Net (HRR-UNet) consisting of a DepthNet and two PoseNets is designed to improve the depth estimation accuracy, and adjacent frames are used for camera self-motion estimation. In particular, the proposed method requires neither manual labeling nor patient computed tomography scans during the training and inference phases. Experimental studies on the laparoscopic data set of the Hamlyn Centre show that our method can effectively achieve accurate depth information after net smoking in real surgical scenes while preserving the blood vessels, contours and textures of the surgical site. The experimental results demonstrate that the proposed method outperforms existing state-of-the-art methods in effectiveness and achieves a frame rate of 94.45fps in real time, making it a promising clinical application.