Joint estimation of depth and motion from a monocular endoscopy image sequence using a multi-loss rebalancing network

Shiyuan Liu; Jingfan Fan; Dengpan Song; Tianyu Fu; Yucong Lin; Deqiang Xiao; Hong Song; Yongtian Wang; Jian Yang

doi:10.1364/BOE.457475

Joint estimation of depth and motion from a monocular endoscopy image sequence using a multi-loss rebalancing network

Biomed Opt Express. 2022 Apr 11;13(5):2707-2727. doi: 10.1364/BOE.457475. eCollection 2022 May 1.

Authors

Shiyuan Liu¹, Jingfan Fan^{1

2}, Dengpan Song¹, Tianyu Fu¹, Yucong Lin¹, Deqiang Xiao¹, Hong Song³, Yongtian Wang^{1

4}, Jian Yang¹

Affiliations

¹ Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China.
² fjf@bit.edu.cn.
³ School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China.
⁴ wyt@bit.edu.cn.

Abstract

Building an in vivo three-dimensional (3D) surface model from a monocular endoscopy is an effective technology to improve the intuitiveness and precision of clinical laparoscopic surgery. This paper proposes a multi-loss rebalancing-based method for joint estimation of depth and motion from a monocular endoscopy image sequence. The feature descriptors are used to provide monitoring signals for the depth estimation network and motion estimation network. The epipolar constraints of the sequence frame is considered in the neighborhood spatial information by depth estimation network to enhance the accuracy of depth estimation. The reprojection information of depth estimation is used to reconstruct the camera motion by motion estimation network with a multi-view relative pose fusion mechanism. The relative response loss, feature consistency loss, and epipolar consistency loss function are defined to improve the robustness and accuracy of the proposed unsupervised learning-based method. Evaluations are implemented on public datasets. The error of motion estimation in three scenes decreased by 42.1%,53.6%, and 50.2%, respectively. And the average error of 3D reconstruction is 6.456 ± 1.798mm. This demonstrates its capability to generate reliable depth estimation and trajectory reconstruction results for endoscopy images and meaningful applications in clinical.