Anchor-guided online meta adaptation for fast one-Shot instrument segmentation from robotic surgical videos

Med Image Anal. 2021 Dec:74:102240. doi: 10.1016/j.media.2021.102240. Epub 2021 Sep 20.

Abstract

The scarcity of annotated surgical data in robot-assisted surgery (RAS) motivates prior works to borrow related domain knowledge to achieve promising segmentation results in surgical images by adaptation. For dense instrument tracking in a robotic surgical video, collecting one initial scene to specify target instruments (or parts of tools) is desirable and feasible during the preoperative preparation. In this paper, we study the challenging one-shot instrument segmentation for robotic surgical videos, in which only the first frame mask of each video is provided at test time, such that the pre-trained model (learned from easily accessible source) can adapt to the target instruments. Straightforward methods transfer the domain knowledge by fine-tuning the model on each given mask. Such one-shot optimization takes hundred of iterations and the test runtime is unfeasible. We present anchor-guided online meta adaptation (AOMA) for this problem. We achieve fast one-shot test time optimization by meta-learning a good model initialization and learning rates from source videos to avoid the laborious and handcrafted fine-tuning. The trainable two components are optimized in a video-specific task space with a matching-aware loss. Furthermore, we design an anchor-guided online adaptation to tackle the performance drop throughout a robotic surgical sequence. The model is continuously adapted on motion-insensitive pseudo-masks supported by anchor matching. AOMA achieves state-of-the-art results on two practical scenarios: (1) general videos to surgical videos, (2) public surgical videos to in-house surgical videos, while reducing the test runtime substantially.

Keywords: Anchor matching; Meta-Learning; Online adaptation; Robotic surgical video; Surgical instrument segmentation.

Publication types

  • Research Support, Non-U.S. Gov't
  • Video-Audio Media

MeSH terms

  • Humans
  • Learning
  • Motion
  • Robotic Surgical Procedures*
  • Surgical Instruments