FF-ViT: probe orientation regression for robot-assisted endomicroscopy tissue scanning

Int J Comput Assist Radiol Surg. 2024 Apr 10. doi: 10.1007/s11548-024-03113-2. Online ahead of print.

Abstract

Purpose: Probe-based confocal laser endomicroscopy (pCLE) enables visualization of cellular tissue morphology during surgical procedures. To capture high-quality pCLE images during tissue scanning, it is important to maintain close contact between the probe and the tissue, while also keeping the probe perpendicular to the tissue surface. Existing robotic pCLE tissue scanning systems, which rely on macroscopic vision, struggle to accurately place the probe at the optimal position on the tissue surface. As a result, the need arises for regression of longitudinal distance and orientation via endomicroscopic vision.

Method: This paper introduces a novel method for automatically regressing the orientation between a pCLE probe and the tissue surface during robotic scanning, utilizing the fast Fourier vision transformer (FF-ViT) to extract local frequency representations and use them for probe orientation regression. Additionally, the FF-ViT incorporates a blur mapping attention (BMA) module to refine latent representations, which is combined with the pyramid angle regressor (PAR) to precisely estimate probe orientation.

Result: A first of its kind dataset for pCLE probe-tissue orientation (pCLE-PTO) has been created. The performance evaluation demonstrates that our proposed network surpasses other top regression networks in accuracy, stability, and generalizability, while maintaining low computational complexity (1.8G FLOPs) and high inference speed (90 fps).

Conclusion: The performance evaluation study verifies the clinical value of the proposed framework and its potential to be integrated into surgical robotic platforms for intraoperative tissue scanning.

Keywords: Cross-attention; Endomicroscopy; Fast Fourier transform; Transformer.