Anchor-guided online meta adaptation for fast one-Shot instrument segmentation from robotic surgical videos

Zixu Zhao; Yueming Jin; Junming Chen; Bo Lu; Chi-Fai Ng; Yun-Hui Liu; Qi Dou; Pheng-Ann Heng

doi:10.1016/j.media.2021.102240

Anchor-guided online meta adaptation for fast one-Shot instrument segmentation from robotic surgical videos

Med Image Anal. 2021 Dec:74:102240. doi: 10.1016/j.media.2021.102240. Epub 2021 Sep 20.

Authors

Zixu Zhao¹, Yueming Jin², Junming Chen¹, Bo Lu³, Chi-Fai Ng⁴, Yun-Hui Liu³, Qi Dou⁵, Pheng-Ann Heng⁵

Affiliations

¹ Department of Computer Science and Engineering, The Chinese University of Hong Kong, HKSAR, China.
² Department of Computer Science and Engineering, The Chinese University of Hong Kong, HKSAR, China. Electronic address: ymjin@cse.cuhk.edu.hk.
³ Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, HKSAR, China; T-Stone Robotics Institute, The Chinese University of Hong Kong, HKSAR, China.
⁴ Department of Surgery, The Chinese University of Hong Kong, HKSAR, China.
⁵ Department of Computer Science and Engineering, The Chinese University of Hong Kong, HKSAR, China; T-Stone Robotics Institute, The Chinese University of Hong Kong, HKSAR, China.

PMID: 34614476
DOI: 10.1016/j.media.2021.102240

Abstract

The scarcity of annotated surgical data in robot-assisted surgery (RAS) motivates prior works to borrow related domain knowledge to achieve promising segmentation results in surgical images by adaptation. For dense instrument tracking in a robotic surgical video, collecting one initial scene to specify target instruments (or parts of tools) is desirable and feasible during the preoperative preparation. In this paper, we study the challenging one-shot instrument segmentation for robotic surgical videos, in which only the first frame mask of each video is provided at test time, such that the pre-trained model (learned from easily accessible source) can adapt to the target instruments. Straightforward methods transfer the domain knowledge by fine-tuning the model on each given mask. Such one-shot optimization takes hundred of iterations and the test runtime is unfeasible. We present anchor-guided online meta adaptation (AOMA) for this problem. We achieve fast one-shot test time optimization by meta-learning a good model initialization and learning rates from source videos to avoid the laborious and handcrafted fine-tuning. The trainable two components are optimized in a video-specific task space with a matching-aware loss. Furthermore, we design an anchor-guided online adaptation to tackle the performance drop throughout a robotic surgical sequence. The model is continuously adapted on motion-insensitive pseudo-masks supported by anchor matching. AOMA achieves state-of-the-art results on two practical scenarios: (1) general videos to surgical videos, (2) public surgical videos to in-house surgical videos, while reducing the test runtime substantially.

Keywords: Anchor matching; Meta-Learning; Online adaptation; Robotic surgical video; Surgical instrument segmentation.

Publication types

Research Support, Non-U.S. Gov't
Video-Audio Media

MeSH terms

Humans
Learning
Motion
Robotic Surgical Procedures*
Surgical Instruments