Harnessing clinical annotations to improve deep learning performance in prostate segmentation

Karthik V Sarma; Alex G Raman; Nikhil J Dhinagar; Alan M Priester; Stephanie Harmon; Thomas Sanford; Sherif Mehralivand; Baris Turkbey; Leonard S Marks; Steven S Raman; William Speier; Corey W Arnold

doi:10.1371/journal.pone.0253829

Harnessing clinical annotations to improve deep learning performance in prostate segmentation

PLoS One. 2021 Jun 25;16(6):e0253829. doi: 10.1371/journal.pone.0253829. eCollection 2021.

Authors

Karthik V Sarma¹, Alex G Raman^{1

2}, Nikhil J Dhinagar^{1

3}, Alan M Priester¹, Stephanie Harmon^{4

5}, Thomas Sanford^{4

6}, Sherif Mehralivand⁴, Baris Turkbey⁴, Leonard S Marks¹, Steven S Raman¹, William Speier¹, Corey W Arnold¹

Affiliations

¹ University of California, Los Angeles, Los Angeles, CA, United States of America.
² Western University of Health Sciences, Pomona, CA, United States of America.
³ Keck School of Medicine, University of Southern California, Los Angeles, CA, United States of America.
⁴ National Cancer Institute, National Institutes of Health, Bethesda, MD, United States of America.
⁵ Clinical Research Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, United States of America.
⁶ SUNY Upstate Medical Center, Syracuse, NY, United States of America.

Abstract

Purpose: Developing large-scale datasets with research-quality annotations is challenging due to the high cost of refining clinically generated markup into high precision annotations. We evaluated the direct use of a large dataset with only clinically generated annotations in development of high-performance segmentation models for small research-quality challenge datasets.

Materials and methods: We used a large retrospective dataset from our institution comprised of 1,620 clinically generated segmentations, and two challenge datasets (PROMISE12: 50 patients, ProstateX-2: 99 patients). We trained a 3D U-Net convolutional neural network (CNN) segmentation model using our entire dataset, and used that model as a template to train models on the challenge datasets. We also trained versions of the template model using ablated proportions of our dataset, and evaluated the relative benefit of those templates for the final models. Finally, we trained a version of the template model using an out-of-domain brain cancer dataset, and evaluated the relevant benefit of that template for the final models. We used five-fold cross-validation (CV) for all training and evaluation across our entire dataset.

Results: Our model achieves state-of-the-art performance on our large dataset (mean overall Dice 0.916, average Hausdorff distance 0.135 across CV folds). Using this model as a pre-trained template for refining on two external datasets significantly enhanced performance (30% and 49% enhancement in Dice scores respectively). Mean overall Dice and mean average Hausdorff distance were 0.912 and 0.15 for the ProstateX-2 dataset, and 0.852 and 0.581 for the PROMISE12 dataset. Using even small quantities of data to train the template enhanced performance, with significant improvements using 5% or more of the data.

Conclusion: We trained a state-of-the-art model using unrefined clinical prostate annotations and found that its use as a template model significantly improved performance in other prostate segmentation tasks, even when trained with only 5% of the original dataset.

Publication types

Research Support, N.I.H., Extramural
Research Support, N.I.H., Intramural
Research Support, Non-U.S. Gov't

MeSH terms

Data Curation*
Databases, Factual*
Deep Learning*
Humans
Male
Prostate / diagnostic imaging*
Retrospective Studies
Tomography, X-Ray Computed*

Abstract

Publication types

MeSH terms

Grants and funding