CLAD-Net: cross-layer aggregation attention network for real-time endoscopic instrument detection

Xiushun Zhao; Jing Guo; Zhaoshui He; Xiaobing Jiang; Haifang Lou; Depei Li

doi:10.1007/s13755-023-00260-9

CLAD-Net: cross-layer aggregation attention network for real-time endoscopic instrument detection

Health Inf Sci Syst. 2023 Nov 27;11(1):58. doi: 10.1007/s13755-023-00260-9. eCollection 2023 Dec.

Authors

Xiushun Zhao¹, Jing Guo¹, Zhaoshui He¹, Xiaobing Jiang², Haifang Lou³, Depei Li²

Affiliations

¹ School of Automation, Guangdong University of Technology, Guangzhou, 510006 China.
² Department of Neurosurgery, Sun Yat-Sen University Cancer Center, Guangzhou, 510006 China.
³ Department of Gastroenterology, The First Affiliated Hospital of Zhejiang Chinese Medical University (Zhejiang Provincial Hospital of Chinese Medicine), Hangzhou, 310006 China.

PMID: 38028959
PMCID: PMC10678866 (available on 2024-12-01)
DOI: 10.1007/s13755-023-00260-9

Abstract

As medical treatments continue to advance rapidly, minimally invasive surgery (MIS) has found extensive applications across various clinical procedures. Accurate identification of medical instruments plays a vital role in comprehending surgical situations and facilitating endoscopic image-guided surgical procedures. However, the endoscopic instrument detection poses a great challenge owing to the narrow operating space, with various interfering factors (e.g. smoke, blood, body fluids) and inevitable issues (e.g. mirror reflection, visual obstruction, illumination variation) in the surgery. To promote surgical efficiency and safety in MIS, this paper proposes a cross-layer aggregated attention detection network (CLAD-Net) for accurate and real-time detection of endoscopic instruments in complex surgical scenarios. We propose a cross-layer aggregation attention module to enhance the fusion of features and raise the effectiveness of lateral propagation of feature information. We propose a composite attention mechanism (CAM) to extract contextual information at different scales and model the importance of each channel in the feature map, mitigate the information loss due to feature fusion, and effectively solve the problem of inconsistent target size and low contrast in complex contexts. Moreover, the proposed feature refinement module (RM) enhances the network's ability to extract target edge and detail information by adaptively adjusting the feature weights to fuse different layers of features. The performance of CLAD-Net was evaluated using a public laparoscopic dataset Cholec80 and another set of neuroendoscopic dataset from Sun Yat-sen University Cancer Center. From both datasets and comparisons, CLAD-Net achieves the $A P_{0.5}$ of 98.9% and 98.6%, respectively, that is better than advanced detection networks. A video for the real-time detection is presented in the following link: https://github.com/A0268/video-demo.

Keywords: Composite attention mechanism; Cross-layer feature aggregation; Refinement module; Surgical instrument detection.

© The Author(s), under exclusive licence to Springer Nature Switzerland AG 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.