Registration of preoperative temporal bone CT-scan to otoendoscopic video for augmented-reality based on convolutional neural networks

Eur Arch Otorhinolaryngol. 2024 Jun;281(6):2921-2930. doi: 10.1007/s00405-023-08403-0. Epub 2024 Jan 10.

Abstract

Purpose: Patient-to-image registration is a preliminary step required in surgical navigation based on preoperative images. Human intervention and fiducial markers hamper this task as they are time-consuming and introduce potential errors. We aimed to develop a fully automatic 2D registration system for augmented reality in ear surgery.

Methods: CT-scans and corresponding oto-endoscopic videos were collected from 41 patients (58 ears) undergoing ear examination (vestibular schwannoma before surgery, profound hearing loss requiring cochlear implant, suspicion of perilymphatic fistula, contralateral ears in cases of unilateral chronic otitis media). Two to four images were selected from each case. For the training phase, data from patients (75% of the dataset) and 11 cadaveric specimens were used. Tympanic membranes and malleus handles were contoured on both video images and CT-scans by expert surgeons. The algorithm used a U-Net network for detecting the contours of the tympanic membrane and the malleus on both preoperative CT-scans and endoscopic video frames. Then, contours were processed and registered through an iterative closest point algorithm. Validation was performed on 4 cases and testing on 6 cases. Registration error was measured by overlaying both images and measuring the average and Hausdorff distances.

Results: The proposed registration method yielded a precision compatible with ear surgery with a 2D mean overlay error of 0.65 ± 0.60 mm for the incus and 0.48 ± 0.32 mm for the round window. The average Hausdorff distance for these 2 targets was 0.98 ± 0.60 mm and 0.78 ± 0.34 mm respectively. An outlier case with higher errors (2.3 mm and 1.5 mm average Hausdorff distance for incus and round window respectively) was observed in relation to a high discrepancy between the projection angle of the reconstructed CT-scan and the video image. The maximum duration for the overall process was 18 s.

Conclusions: A fully automatic 2D registration method based on a convolutional neural network and applied to ear surgery was developed. The method did not rely on any external fiducial markers nor human intervention for landmark recognition. The method was fast and its precision was compatible with ear surgery.

Keywords: Augmented reality; Ear surgery; Endoscopic video; Registration.

MeSH terms

  • Adult
  • Algorithms
  • Augmented Reality
  • Ear Diseases / diagnostic imaging
  • Ear Diseases / surgery
  • Endoscopy / methods
  • Female
  • Humans
  • Male
  • Malleus / diagnostic imaging
  • Malleus / surgery
  • Middle Aged
  • Neural Networks, Computer*
  • Otologic Surgical Procedures / methods
  • Otoscopy / methods
  • Surgery, Computer-Assisted / methods
  • Temporal Bone / diagnostic imaging
  • Temporal Bone / surgery
  • Tomography, X-Ray Computed* / methods
  • Tympanic Membrane / diagnostic imaging
  • Tympanic Membrane / surgery
  • Video Recording