Effects of sample size and data augmentation on U-Net-based automatic segmentation of various organs

Radiol Phys Technol. 2021 Sep;14(3):318-327. doi: 10.1007/s12194-021-00630-6. Epub 2021 Jul 12.

Abstract

Deep learning has demonstrated high efficacy for automatic segmentation in contour delineation, which is crucial in radiation therapy planning. However, the collection, labeling, and management of medical imaging data can be challenging. This study aims to elucidate the effects of sample size and data augmentation on the automatic segmentation of computed tomography images using U-Net, a deep learning method. For the chest and pelvic regions, 232 and 556 cases are evaluated, respectively. We investigate multiple conditions by changing the sum of the training and validation datasets across a broad range of values: 10-200 and 10-500 cases for the chest and pelvic regions, respectively. A U-Net is constructed, and horizontal-flip data augmentation, which produces left and right inverse images resulting in twice the number of images, is compared with no augmentation for each training session. All lung cases and more than 100 prostate, bladder, and rectum cases indicate that adding horizontal-flip data augmentation is almost as effective as doubling the number of cases. The slope of the Dice similarity coefficient (DSC) in all organs decreases rapidly until approximately 100 cases, stabilizes after 200 cases, and shows minimal changes as the number of cases is increased further. The DSCs stabilize at a smaller sample size with the incorporation of data augmentation in all organs except the heart. This finding is applicable to the automation of radiation therapy for rare cancers, where large datasets may be difficult to obtain.

Keywords: Automatic segmentation; Data augmentation; Radiation therapy; Sample size.

MeSH terms

  • Humans
  • Lung
  • Male
  • Prostate*
  • Sample Size
  • Thorax
  • Tomography, X-Ray Computed*