Clinical VMAT machine parameter optimization for localized prostate cancer using deep reinforcement learning

Med Phys. 2024 Apr 26. doi: 10.1002/mp.17100. Online ahead of print.

Abstract

Background: Volumetric modulated arc therapy (VMAT) machine parameter optimization (MPO) remains computationally expensive and sensitive to input dose objectives creating challenges for manual and automatic planning. Reinforcement learning (RL) involves machine learning through extensive trial-and-error, demonstrating performance exceeding humans, and existing algorithms in several domains.

Purpose: To develop and evaluate an RL approach for VMAT MPO for localized prostate cancer to rapidly and automatically generate deliverable VMAT plans for a clinical linear accelerator (linac) and compare resultant dosimetry to clinical plans.

Methods: We extended our previous RL approach to enable VMAT MPO of a 3D beam model for a clinical linac through a policy network. It accepts an input state describing the current control point and predicts continuous machine parameters for the next control point, which are used to update the input state, repeating until plan termination. RL training was conducted to minimize a dose-based cost function for prescription of 60 Gy in 20 fractions using CT scans and contours from 136 retrospective localized prostate cancer patients, 20 of which had existing plans used to initialize training. Data augmentation was employed to mitigate over-fitting, and parameter exploration was achieved using Gaussian perturbations. Following training, RL VMAT was applied to an independent cohort of 15 patients, and the resultant dosimetry was compared to clinical plans. We also combined the RL approach with our clinical treatment planning system (TPS) to automate final plan refinement, and creating the potential for manual review and edits as required for clinical use.

Results: RL training was conducted for 5000 iterations, producing 40 000 plans during exploration. Mean ± SD execution time to produce deliverable VMAT plans in the test cohort was 3.3 ± 0.5 s which were automatically refined in the TPS taking an additional 77.4 ± 5.8 s. When normalized to provide equivalent target coverage, the RL+TPS plans provided a similar mean ± SD overall maximum dose of 63.2 ± 0.6 Gy and a lower mean rectum dose of 17.4 ± 7.4 compared to 63.9 ± 1.5 Gy (p = 0.061) and 21.0 ± 6.0 (p = 0.024) for the clinical plans.

Conclusions: An approach for VMAT MPO using RL for a clinical linac model was developed and applied to automatically generate deliverable plans for localized prostate cancer patients, and when combined with the clinical TPS shows potential to rapidly generate high-quality plans. The RL VMAT approach shows promise to discover advanced linac control policies through trial-and-error, and algorithm limitations and future directions are identified and discussed.

Keywords: VMAT; artificial intelligence; automation; deep learning; prostate cancer; reinforcement learning.

Grants and funding