Automatic lung segmentation in routine imaging is primarily a data diversity problem, not a methodology problem

Johannes Hofmanninger; Forian Prayer; Jeanny Pan; Sebastian Röhrich; Helmut Prosch; Georg Langs

doi:10.1186/s41747-020-00173-2

Automatic lung segmentation in routine imaging is primarily a data diversity problem, not a methodology problem

Eur Radiol Exp. 2020 Aug 20;4(1):50. doi: 10.1186/s41747-020-00173-2.

Authors

Johannes Hofmanninger¹, Forian Prayer², Jeanny Pan², Sebastian Röhrich², Helmut Prosch², Georg Langs³

Affiliations

¹ Department of Biomedical Imaging and Image-guided Therapy, Medical University of Vienna, Waehringer Guertel, 18-20, Vienna, Austria. johannes.hofmanninger@meduniwien.ac.at.
² Department of Biomedical Imaging and Image-guided Therapy, Medical University of Vienna, Waehringer Guertel, 18-20, Vienna, Austria.
³ Department of Biomedical Imaging and Image-guided Therapy, Medical University of Vienna, Waehringer Guertel, 18-20, Vienna, Austria. georg.langs@meduniwien.ac.at.

Abstract

Background: Automated segmentation of anatomical structures is a crucial step in image analysis. For lung segmentation in computed tomography, a variety of approaches exists, involving sophisticated pipelines trained and validated on different datasets. However, the clinical applicability of these approaches across diseases remains limited.

Methods: We compared four generic deep learning approaches trained on various datasets and two readily available lung segmentation algorithms. We performed evaluation on routine imaging data with more than six different disease patterns and three published data sets.

Results: Using different deep learning approaches, mean Dice similarity coefficients (DSCs) on test datasets varied not over 0.02. When trained on a diverse routine dataset (n = 36), a standard approach (U-net) yields a higher DSC (0.97 ± 0.05) compared to training on public datasets such as the Lung Tissue Research Consortium (0.94 ± 0.13, p = 0.024) or Anatomy 3 (0.92 ± 0.15, p = 0.001). Trained on routine data (n = 231) covering multiple diseases, U-net compared to reference methods yields a DSC of 0.98 ± 0.03 versus 0.94 ± 0.12 (p = 0.024).

Conclusions: The accuracy and reliability of lung segmentation algorithms on demanding cases primarily relies on the diversity of the training data, highlighting the importance of data diversity compared to model choice. Efforts in developing new datasets and providing trained models to the public are critical. By releasing the trained model under General Public License 3.0, we aim to foster research on lung diseases by providing a readily available tool for segmentation of pathological lungs.

Keywords: Algorithms; Deep learning; Lung; Reproducibility of results; Tomography (x-ray computed).

Publication types

Comparative Study
Research Support, Non-U.S. Gov't

MeSH terms

Datasets as Topic
Deep Learning*
Humans
Lung Diseases / diagnostic imaging*
Radiographic Image Interpretation, Computer-Assisted / methods*
Reproducibility of Results
Tomography, X-Ray Computed*

Grants and funding

I 2714/FWF_/Austrian Science Fund FWF/Austria