Weakly Supervised Temporal Convolutional Networks for Fine-Grained Surgical Activity Recognition

IEEE Trans Med Imaging. 2023 Sep;42(9):2592-2602. doi: 10.1109/TMI.2023.3262847. Epub 2023 Aug 31.

Abstract

Automatic recognition of fine-grained surgical activities, called steps, is a challenging but crucial task for intelligent intra-operative computer assistance. The development of current vision-based activity recognition methods relies heavily on a high volume of manually annotated data. This data is difficult and time-consuming to generate and requires domain-specific knowledge. In this work, we propose to use coarser and easier-to-annotate activity labels, namely phases, as weak supervision to learn step recognition with fewer step annotated videos. We introduce a step-phase dependency loss to exploit the weak supervision signal. We then employ a Single-Stage Temporal Convolutional Network (SS-TCN) with a ResNet-50 backbone, trained in an end-to-end fashion from weakly annotated videos, for temporal activity segmentation and recognition. We extensively evaluate and show the effectiveness of the proposed method on a large video dataset consisting of 40 laparoscopic gastric bypass procedures and the public benchmark CATARACTS containing 50 cataract surgeries.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Neural Networks, Computer*
  • Surgery, Computer-Assisted*