Dense surface reconstruction using a learning-based monocular vSLAM model for laparoscopic surgery

James Yu; Kelden Pruitt; Nati Nawawithan; Brett A Johnson; Jeffrey Gahan; Baowei Fei

doi:10.1117/12.3008768

Dense surface reconstruction using a learning-based monocular vSLAM model for laparoscopic surgery

Proc SPIE Int Soc Opt Eng. 2024 Feb:12928:129280J. doi: 10.1117/12.3008768. Epub 2024 Mar 29.

Authors

James Yu^{1

2

3}, Kelden Pruitt^{1

3}, Nati Nawawithan^{1

3}, Brett A Johnson⁴, Jeffrey Gahan⁴, Baowei Fei^{1

2

3}

Affiliations

¹ Center for Imaging and Surgical Innovation, University of Texas at Dallas, Richardson, TX.
² Department of Radiology, University of Texas Southwestern Medical Center, Dallas, TX.
³ Department of Bioengineering, University of Texas at Dallas, Richardson, TX.
⁴ Department of Urology, University of Texas Southwestern Medical Center, Dallas, TX.

Abstract

Augmented reality (AR) has seen increased interest and attention for its application in surgical procedures. AR-guided surgical systems can overlay segmented anatomy from pre-operative imaging onto the user's environment to delineate hard-to-see structures and subsurface lesions intraoperatively. While previous works have utilized pre-operative imaging such as computed tomography or magnetic resonance images, registration methods still lack the ability to accurately register deformable anatomical structures without fiducial markers across modalities and dimensionalities. This is especially true of minimally invasive abdominal surgical techniques, which often employ a monocular laparoscope, due to inherent limitations. Surgical scene reconstruction is a critical component towards accurate registrations needed for AR-guided surgery and other downstream AR applications such as remote assistance or surgical simulation. In this work, we utilize a state-of-the-art (SOTA) deep-learning-based visual simultaneous localization and mapping (vSLAM) algorithm to generate a dense 3D reconstruction with camera pose estimations and depth maps from video obtained with a monocular laparoscope. The proposed method can robustly reconstruct surgical scenes using real-time data and provide camera pose estimations without stereo or additional sensors, which increases its usability and is less intrusive. We also demonstrate a framework to evaluate current vSLAM algorithms on non-Lambertian, low-texture surfaces and explore using its outputs on downstream tasks. We expect these evaluation methods can be utilized for the continual refinement of newer algorithms for AR-guided surgery.

Keywords: 3D reconstruction; MRI; SLAM; augmented reality; deep learning; image-guided surgery; laparoscopy; neural networks.

Abstract

Grants and funding