Registration of preoperative temporal bone CT-scan to otoendoscopic video for augmented-reality based on convolutional neural networks

Ali Taleb; Sarah Leclerc; Raabid Hussein; Alain Lalande; Alexis Bozorg-Grayeli

doi:10.1007/s00405-023-08403-0

Registration of preoperative temporal bone CT-scan to otoendoscopic video for augmented-reality based on convolutional neural networks

Eur Arch Otorhinolaryngol. 2024 Jun;281(6):2921-2930. doi: 10.1007/s00405-023-08403-0. Epub 2024 Jan 10.

Authors

Ali Taleb¹, Sarah Leclerc^#², Raabid Hussein^#³, Alain Lalande^{2

4}, Alexis Bozorg-Grayeli^{2

5}

Affiliations

¹ ICMUB Laboratory UMR CNRS 6302, University of Burgundy Franche Comte, 21000, Dijon, France. ali.taleb.bfc@gmail.com.
² ICMUB Laboratory UMR CNRS 6302, University of Burgundy Franche Comte, 21000, Dijon, France.
³ Oticon Medical, 06220, Vallauris, France.
⁴ Medical Imaging Department, Dijon University Hospital, 21000, Dijon, France.
⁵ ENT Department, Dijon University Hospital, 21000, Dijon, France.

^# Contributed equally.

PMID: 38200355
DOI: 10.1007/s00405-023-08403-0

Abstract

Purpose: Patient-to-image registration is a preliminary step required in surgical navigation based on preoperative images. Human intervention and fiducial markers hamper this task as they are time-consuming and introduce potential errors. We aimed to develop a fully automatic 2D registration system for augmented reality in ear surgery.

Methods: CT-scans and corresponding oto-endoscopic videos were collected from 41 patients (58 ears) undergoing ear examination (vestibular schwannoma before surgery, profound hearing loss requiring cochlear implant, suspicion of perilymphatic fistula, contralateral ears in cases of unilateral chronic otitis media). Two to four images were selected from each case. For the training phase, data from patients (75% of the dataset) and 11 cadaveric specimens were used. Tympanic membranes and malleus handles were contoured on both video images and CT-scans by expert surgeons. The algorithm used a U-Net network for detecting the contours of the tympanic membrane and the malleus on both preoperative CT-scans and endoscopic video frames. Then, contours were processed and registered through an iterative closest point algorithm. Validation was performed on 4 cases and testing on 6 cases. Registration error was measured by overlaying both images and measuring the average and Hausdorff distances.

Results: The proposed registration method yielded a precision compatible with ear surgery with a 2D mean overlay error of $0.65 \pm 0.60$ mm for the incus and $0.48 \pm 0.32$ mm for the round window. The average Hausdorff distance for these 2 targets was $0.98 \pm 0.60$ mm and $0.78 \pm 0.34$ mm respectively. An outlier case with higher errors (2.3 mm and 1.5 mm average Hausdorff distance for incus and round window respectively) was observed in relation to a high discrepancy between the projection angle of the reconstructed CT-scan and the video image. The maximum duration for the overall process was 18 s.

Conclusions: A fully automatic 2D registration method based on a convolutional neural network and applied to ear surgery was developed. The method did not rely on any external fiducial markers nor human intervention for landmark recognition. The method was fast and its precision was compatible with ear surgery.

Keywords: Augmented reality; Ear surgery; Endoscopic video; Registration.

MeSH terms

Adult
Algorithms
Augmented Reality
Ear Diseases / diagnostic imaging
Ear Diseases / surgery
Endoscopy / methods
Female
Humans
Male
Malleus / diagnostic imaging
Malleus / surgery
Middle Aged
Neural Networks, Computer*
Otologic Surgical Procedures / methods
Otoscopy / methods
Surgery, Computer-Assisted / methods
Temporal Bone / diagnostic imaging
Temporal Bone / surgery
Tomography, X-Ray Computed* / methods
Tympanic Membrane / diagnostic imaging
Tympanic Membrane / surgery
Video Recording