A spatio-temporal network for video semantic segmentation in surgical videos

Maria Grammatikopoulou; Ricardo Sanchez-Matilla; Felix Bragman; David Owen; Lucy Culshaw; Karen Kerr; Danail Stoyanov; Imanol Luengo

doi:10.1007/s11548-023-02971-6

A spatio-temporal network for video semantic segmentation in surgical videos

Int J Comput Assist Radiol Surg. 2024 Feb;19(2):375-382. doi: 10.1007/s11548-023-02971-6. Epub 2023 Jun 22.

Authors

Maria Grammatikopoulou¹, Ricardo Sanchez-Matilla², Felix Bragman², David Owen², Lucy Culshaw², Karen Kerr², Danail Stoyanov^{2

3}, Imanol Luengo²

Affiliations

¹ Medtronic plc, London, UK. maria.grammatikopoulou@medtronic.com.
² Medtronic plc, London, UK.
³ Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, UK.

PMID: 37347345
DOI: 10.1007/s11548-023-02971-6

Abstract

Purpose: Semantic segmentation in surgical videos has applications in intra-operative guidance, post-operative analytics and surgical education. Models need to provide accurate predictions since temporally inconsistent identification of anatomy can hinder patient safety. We propose a novel architecture for modelling temporal relationships in videos to address these issues.

Methods: We developed a temporal segmentation model that includes a static encoder and a spatio-temporal decoder. The encoder processes individual frames whilst the decoder learns spatio-temporal relationships from frame sequences. The decoder can be used with any suitable encoder to improve temporal consistency.

Results: Model performance was evaluated on the CholecSeg8k dataset and a private dataset of robotic Partial Nephrectomy procedures. Mean Intersection over Union improved by 1.30% and 4.27% respectively for each dataset when the temporal decoder was applied. Our model also displayed improvements in temporal consistency up to 7.23%.

Conclusions: This work demonstrates an advance in video segmentation of surgical scenes with potential applications in surgery with a view to improve patient outcomes. The proposed decoder can extend state-of-the-art static models, and it is shown that it can improve per-frame segmentation output and video temporal consistency.

Keywords: Semantic segmentation; Video segmentation.

MeSH terms

Humans
Learning
Nephrectomy
Postoperative Period
Robotics*
Semantics*