Towards fully automated inner ear analysis with deep-learning-based joint segmentation and landmark detection framework

Jannik Stebani; Martin Blaimer; Simon Zabler; Tilmann Neun; Daniël M Pelt; Kristen Rak

doi:10.1038/s41598-023-45466-9

Towards fully automated inner ear analysis with deep-learning-based joint segmentation and landmark detection framework

Sci Rep. 2023 Nov 4;13(1):19057. doi: 10.1038/s41598-023-45466-9.

Authors

Jannik Stebani^{1

2

3}, Martin Blaimer⁴, Simon Zabler^{4

5}, Tilmann Neun⁶, Daniël M Pelt⁷, Kristen Rak⁸

Affiliations

¹ Magnetic Resonance and X-Ray Imaging Department, Fraunhofer Institute for Integrated Circuits IIS, 97074, Würzburg, Germany. jannik.stebani@iis.fraunhofer.de.
² Universität Würzburg, Experimentelle Physik V, 97074, Würzburg, Germany. jannik.stebani@iis.fraunhofer.de.
³ Department of Oto-Rhino-Laryngology, Plastic, Aesthetic and Reconstructive Head and Neck Surgery and the Comprehensive Hearing Center, Universitätsklinikum Würzburg, 97080, Würzburg, Germany. jannik.stebani@iis.fraunhofer.de.
⁴ Magnetic Resonance and X-Ray Imaging Department, Fraunhofer Institute for Integrated Circuits IIS, 97074, Würzburg, Germany.
⁵ Faculty of Computer Science, Deggendorf Institute of Technology, Deggendorf, Germany.
⁶ Institute for Diagnostic and Interventional Neuroradiology, Universitätsklinikum Würzburg, 97080, Würzburg, Germany.
⁷ Leiden Institute of Advanced Computer Science (LIACS), Universiteit Leiden, Leiden, CA, 2333, The Netherlands.
⁸ Department of Oto-Rhino-Laryngology, Plastic, Aesthetic and Reconstructive Head and Neck Surgery and the Comprehensive Hearing Center, Universitätsklinikum Würzburg, 97080, Würzburg, Germany.

Abstract

Automated analysis of the inner ear anatomy in radiological data instead of time-consuming manual assessment is a worthwhile goal that could facilitate preoperative planning and clinical research. We propose a framework encompassing joint semantic segmentation of the inner ear and anatomical landmark detection of helicotrema, oval and round window. A fully automated pipeline with a single, dual-headed volumetric 3D U-Net was implemented, trained and evaluated using manually labeled in-house datasets from cadaveric specimen ([Formula: see text]) and clinical practice ([Formula: see text]). The model robustness was further evaluated on three independent open-source datasets ([Formula: see text] scans) consisting of cadaveric specimen scans. For the in-house datasets, Dice scores of [Formula: see text], intersection-over-union scores of [Formula: see text] and average Hausdorff distances of [Formula: see text] and [Formula: see text] voxel units were achieved. The landmark localization task was performed automatically with an average localization error of [Formula: see text] voxel units. A robust, albeit reduced performance could be attained for the catalogue of three open-source datasets. Results of the ablation studies with 43 mono-parametric variations of the basal architecture and training protocol provided task-optimal parameters for both categories. Ablation studies against single-task variants of the basal architecture showed a clear performance benefit of coupling landmark localization with segmentation and a dataset-dependent performance impact on segmentation ability.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Cadaver
Deep Learning*
Ear, Inner* / diagnostic imaging
Humans
Image Processing, Computer-Assisted / methods