Spatio-Temporal Causal Transformer for Multi-Grained Surgical Phase Recognition

Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul:2022:1663-1666. doi: 10.1109/EMBC48229.2022.9871004.

Abstract

Automatic surgical phase recognition plays a key role in surgical workflow analysis and overall optimization in clinical work. In the complicated surgical procedures, similar inter-class appearance and drastic variability in phase duration make this still a challenging task. In this paper, a spatio-temporal transformer is proposed for online surgical phase recognition with different granularity. To extract rich spatial information, a spatial transformer is used to model global spatial dependencies of each time index. To overcome the variability in phase duration, a temporal transformer captures the multi-scale temporal context of different time indexes with a dual pyramid pattern. Our method is thoroughly validated on the public Cholec80 dataset with 7 coarse-grained phases and the CATARACTS2020 dataset with 19 fine-grained phases, outperforming state-of-the-art approaches with 91.4% and 84.2% accuracy, taking only 24.5M parameters.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Workflow