Prediction of remaining surgery duration in laparoscopic videos based on visual saliency and the transformer network

Int J Med Robot. 2024 Apr;20(2):e2632. doi: 10.1002/rcs.2632.

Abstract

Background: Real-time prediction of the remaining surgery duration (RSD) is important for optimal scheduling of resources in the operating room.

Methods: We focus on the intraoperative prediction of RSD from laparoscopic video. An extensive evaluation of seven common deep learning models, a proposed one based on the Transformer architecture (TransLocal) and four baseline approaches, is presented. The proposed pipeline includes a CNN-LSTM for feature extraction from salient regions within short video segments and a Transformer with local attention mechanisms.

Results: Using the Cholec80 dataset, TransLocal yielded the best performance (mean absolute error (MAE) = 7.1 min). For long and short surgeries, the MAE was 10.6 and 4.4 min, respectively. Thirty minutes before the end of surgery MAE = 6.2 min, 7.2 and 5.5 min for all long and short surgeries, respectively.

Conclusions: The proposed technique achieves state-of-the-art results. In the future, we aim to incorporate intraoperative indicators and pre-operative data.

Keywords: artificial intelligence; cholecystectomy; deep learning; prediction; remaining surgery duration.

MeSH terms

  • Electric Power Supplies
  • Humans
  • Laparoscopy*
  • Operating Rooms