Real-to-virtual domain transfer-based depth estimation for real-time 3D annotation in transnasal surgery: a study of annotation accuracy and stability

Hon-Sing Tong; Yui-Lun Ng; Zhiyu Liu; Justin D L Ho; Po-Ling Chan; Jason Y K Chan; Ka-Wai Kwok

doi:10.1007/s11548-021-02346-9

Real-to-virtual domain transfer-based depth estimation for real-time 3D annotation in transnasal surgery: a study of annotation accuracy and stability

Int J Comput Assist Radiol Surg. 2021 May;16(5):731-739. doi: 10.1007/s11548-021-02346-9. Epub 2021 Mar 30.

Authors

Hon-Sing Tong¹, Yui-Lun Ng¹, Zhiyu Liu¹, Justin D L Ho¹, Po-Ling Chan², Jason Y K Chan³, Ka-Wai Kwok⁴

Affiliations

¹ Department of Mechanical Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong.
² Department of Otorhinolaryngology, Head and Neck Surgery, The Chinese University of Hong Kong, Sha Tin, Hong Kong SAR.
³ Department of Otorhinolaryngology, Head and Neck Surgery, The Chinese University of Hong Kong, Sha Tin, Hong Kong SAR. jasonchan@ent.cuhk.edu.hk.
⁴ Department of Mechanical Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong. kwokkw@hku.hk.

Abstract

Purpose: Surgical annotation promotes effective communication between medical personnel during surgical procedures. However, existing approaches to 2D annotations are mostly static with respect to a display. In this work, we propose a method to achieve 3D annotations that anchor rigidly and stably to target structures upon camera movement in a transnasal endoscopic surgery setting.

Methods: This is accomplished through intra-operative endoscope tracking and monocular depth estimation. A virtual endoscopic environment is utilized to train a supervised depth estimation network. An adversarial network transfers the style from the real endoscopic view to a synthetic-like view for input into the depth estimation network, wherein framewise depth can be obtained in real time.

Results: (1) Accuracy: Framewise depth was predicted from images captured from within a nasal airway phantom and compared with ground truth, achieving a SSIM value of 0.8310 ± 0.0655. (2) Stability: mean absolute error (MAE) between reference and predicted depth of a target point was 1.1330 ± 0.9957 mm.

Conclusion: Both the accuracy and stability evaluations demonstrated the feasibility and practicality of our proposed method for achieving 3D annotations.

Keywords: Augmented reality; Domain transfer learning; Monocular depth estimation; Surgical annotation; Transnasal surgery.

MeSH terms

Cadaver
Calibration
Endoscopy / methods*
Humans
Image Processing, Computer-Assisted
Imaging, Three-Dimensional / methods*
Monitoring, Intraoperative
Phantoms, Imaging*
Reproducibility of Results
Tomography, X-Ray Computed
Video Recording

Grants and funding

17205919; 17206818; 17202317/Research Grants Council, University Grants Committee