i3PosNet: instrument pose estimation from X-ray in temporal bone surgery

David Kügler; Jannik Sehring; Andrei Stefanov; Igor Stenin; Julia Kristin; Thomas Klenzner; Jörg Schipper; Anirban Mukhopadhyay

doi:10.1007/s11548-020-02157-4

i3PosNet: instrument pose estimation from X-ray in temporal bone surgery

Int J Comput Assist Radiol Surg. 2020 Jul;15(7):1137-1145. doi: 10.1007/s11548-020-02157-4. Epub 2020 May 21.

Authors

David Kügler^{1

2}, Jannik Sehring³, Andrei Stefanov³, Igor Stenin⁴, Julia Kristin⁴, Thomas Klenzner⁴, Jörg Schipper⁴, Anirban Mukhopadhyay³

Affiliations

¹ Department of Computer Science, Technischer Universität Darmstadt, Darmstadt, Germany. david.kuegler@dzne.de.
² German Center for Degenerative Diseases (DZNE) e.V., Bonn, Germany. david.kuegler@dzne.de.
³ Department of Computer Science, Technischer Universität Darmstadt, Darmstadt, Germany.
⁴ ENT Clinic, University Düsseldorf, Düsseldorf, Germany.

Abstract

Purpose: Accurate estimation of the position and orientation (pose) of surgical instruments is crucial for delicate minimally invasive temporal bone surgery. Current techniques lack in accuracy and/or line-of-sight constraints (conventional tracking systems) or expose the patient to prohibitive ionizing radiation (intra-operative CT). A possible solution is to capture the instrument with a c-arm at irregular intervals and recover the pose from the image.

Methods: i3PosNet infers the position and orientation of instruments from images using a pose estimation network. Said framework considers localized patches and outputs pseudo-landmarks. The pose is reconstructed from pseudo-landmarks by geometric considerations.

Results: We show i3PosNet reaches errors [Formula: see text] mm. It outperforms conventional image registration-based approaches reducing average and maximum errors by at least two thirds. i3PosNet trained on synthetic images generalizes to real X-rays without any further adaptation.

Conclusion: The translation of deep learning-based methods to surgical applications is difficult, because large representative datasets for training and testing are not available. This work empirically shows sub-millimeter pose estimation trained solely based on synthetic training data.

Keywords: Cochlear implant; Fluoroscopic tracking; Minimally invasive bone surgery; Modular deep learning; Vestibular schwannoma removal; instrument pose estimation.

MeSH terms

Humans
Imaging, Three-Dimensional / methods
Minimally Invasive Surgical Procedures
Otologic Surgical Procedures / methods*
Radiography
Surgery, Computer-Assisted / methods*
Temporal Bone / diagnostic imaging
Temporal Bone / surgery*