Implementation of artificial intelligence-based computer vision model in laparoscopic appendectomy: validation, reliability, and clinical correlation

Surg Endosc. 2024 Apr 25. doi: 10.1007/s00464-024-10847-2. Online ahead of print.

Abstract

Background: Application of artificial intelligence (AI) in general surgery is evolving. Real-world implementation of an AI-based computer-vision model in laparoscopic appendectomy (LA) is presented. We aimed to evaluate (1) its accuracy in complexity grading and safety adherence, (2) clinical correlation to outcomes.

Methods: A retrospective single-center study of 499 consecutive LA videos, captured and analyzed by 'Surgical Intelligence Platform,' Theator Inc. (9/2020-5/2022). Two expert surgeons viewed all videos and manually graded complexity and safety adherence. Automated annotations were compared to surgeons' assessments. Inter-surgeons' agreements were measured. Since 7/2021 videos were linked to patients' admission numbers. Data retrieval from medical records was performed (n = 365). Outcomes were compared between high and low complexity grades.

Results: Low and high complexity grades comprised 74.8 and 25.2% of 499 videos. Surgeons' agreements were high (76.9-94.4%, kappa 0.77/0.91; p < 0.001) for all annotated complexity grades. Surgeons' agreements were also high (96.0-99.8%, kappa 0.78/0.87; p < 0.001) for full safety adherence, whereas agreement was moderate in partial safety adherence and none (32.8-58.8%). Inter-surgeons' agreements were high for complexity grading (kappa 0.86, p < 0.001) and safety adherence (kappa 0.88, p < 0.001). Comparing high to low grade complexity, preoperative clinical features were similar, except larger appendix diameter on imaging (13.4 ± 4.4 vs. 10.5 ± 3.0 mm, p < 0.001). Intraoperative outcomes were significantly higher (p < 0.001), including time to achieve critical view of safety (29.6, IQR 19.1-41.6 vs. 13.7, IQR 8.5-21.1 min), operative duration (45.3, IQR 37.7-65.2 vs. 25.0, IQR 18.3-32.7 min), and intraoperative events (39.4% vs. 5.9%). Postoperative outcomes (7.4% vs. 9.2%) including surgical complications, mortality, and readmissions were comparable (p = 0.6), except length of stay (4, IQR 2-5.5 vs. 1, IQR 1-2 days; p < 0.001).

Conclusion: The model accurately assesses complexity grading and full safety achievement. It can serve to predict operative time and intraoperative course, whereas no clinical correlation was found regarding postoperative outcomes. Further studies are needed.

Keywords: Appendectomy; Artificial intelligence; Complexity; Computer vision; Laparoscopy; Safety.