Explainability and controllability of patient-specific deep learning with attention-based augmentation for markerless image-guided radiotherapy

Toshiyuki Terunuma; Takeji Sakae; Yachao Hu; Hideyuki Takei; Shunsuke Moriya; Toshiyuki Okumura; Hideyuki Sakurai

doi:10.1002/mp.16095

Explainability and controllability of patient-specific deep learning with attention-based augmentation for markerless image-guided radiotherapy

Med Phys. 2023 Jan;50(1):480-494. doi: 10.1002/mp.16095. Epub 2022 Nov 24.

Authors

Toshiyuki Terunuma^{1

2}, Takeji Sakae^{1

2}, Yachao Hu^{2

3}, Hideyuki Takei^{1

2}, Shunsuke Moriya^{1

2}, Toshiyuki Okumura^{1

2}, Hideyuki Sakurai^{1

2}

Affiliations

¹ Faculty of Medicine, University of Tsukuba, Tsukuba, Japan.
² Proton Medical Research Center, University of Tsukuba Hospital, Tsukuba, Japan.
³ Center Hospital and Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China.

Abstract

Background: We reported the concept of patient-specific deep learning (DL) for real-time markerless tumor segmentation in image-guided radiotherapy (IGRT). The method was aimed to control the attention of convolutional neural networks (CNNs) by artificial differences in co-occurrence probability (CoOCP) in training datasets, that is, focusing CNN attention on soft tissues while ignoring bones. However, the effectiveness of this attention-based data augmentation has not been confirmed by explainable techniques. Furthermore, compared to reasonable ground truths, the feasibility of tumor segmentation in clinical kilovolt (kV) X-ray fluoroscopic (XF) images has not been confirmed.

Purpose: The first aim of this paper was to present evidence that the proposed method provides an explanation and control of DL behavior. The second purpose was to validate the real-time lung tumor segmentation in clinical kV XF images for IGRT.

Methods: This retrospective study included 10 patients with lung cancer. Patient-specific and XF angle-specific image pairs comprising digitally reconstructed radiographs (DRRs) and projected-clinical-target-volume (pCTV) images were calculated from four-dimensional computer tomographic data and treatment planning information. The training datasets were primarily augmented by random overlay (RO) and noise injection (NI): RO aims to differentiate positional CoOCP in soft tissues and bones, and NI aims to make a difference in the frequency of occurrence of local and global image features. The CNNs for each patient-and-angle were automatically optimized in the DL training stage to transform the training DRRs into pCTV images. In the inference stage, the trained CNNs transformed the test XF images into pCTV images, thus identifying target positions and shapes.

Results: The visual analysis of DL attention heatmaps for a test image demonstrated that our method focused CNN attention on soft tissue and global image features rather than bones and local features. The processing time for each patient-and-angle-specific dataset in the training stage was ∼30 min, whereas that in the inference stage was 8 ms/frame. The estimated three-dimensional 95 percentile tracking error, Jaccard index, and Hausdorff distance for 10 patients were 1.3-3.9 mm, 0.85-0.94, and 0.6-4.9 mm, respectively.

Conclusions: The proposed attention-based data augmentation with both RO and NI made the CNN behavior more explainable and more controllable. The results obtained demonstrated the feasibility of real-time markerless lung tumor segmentation in kV XF images for IGRT.

Keywords: IGRT; attention-based data augmentation; kV X-ray fluoroscopy; patient-specific deep learning; tumor tracking and segmentation.

MeSH terms

Deep Learning*
Humans
Image Processing, Computer-Assisted / methods
Lung Neoplasms* / radiotherapy
Neural Networks, Computer
Radiotherapy, Image-Guided* / methods
Retrospective Studies

Abstract

MeSH terms

Grants and funding