Automatic evaluation of contours in radiotherapy planning utilising conformity indices and machine learning

Phys Imaging Radiat Oncol. 2020 Dec 1:16:149-155. doi: 10.1016/j.phro.2020.10.008. eCollection 2020 Oct.

Abstract

Background and purpose: Peer-review of Target Volume (TV) and Organ at Risk (OAR) contours in radiotherapy planning are typically conducted visually; this can be time consuming and subject to interobserver variation. This study investigated automatic evaluation of contouring using conformity indices and supervised machine learning.

Methods: A total of 393 contours from 253 Stereotactic Ablative Body Radiotherapy (SABR) benchmark cases (adrenal gland, liver, pelvic lymph node and spine), delineated by 132 clinicians from 25 centres, were visually evaluated for conformity against gold standard contours. Contours were scored as "pass" or "fail" on visual peer review and six Conformity Indices (CIs) were applied. CI values were mapped to pass/fail scores for each contour and used to train supervised machine learning models. A 5-fold cross validation method was employed to determine the predictive accuracies of each model.

Results: The stomach structure produced models with the highest predictive accuracy overall (96% using Support Vector Machine and Ensemble models), whilst the liver GTV produced models with the lowest predictive accuracy (76% using Logistic Regression). Predictive accuracies across all models ranged from 68-96% (68-87% for TV and 71-96% for OARs).

Conclusions: Although a final visual review by an experienced clinician is still required, the automatic contour evaluation method could reduce the time for benchmark case reviews by identifying gross contouring errors. This method could be successfully implemented to support departmental training and the continuous assessment of outlining for clinical staff in the peer-review process, to reduce interobserver variability in contouring and improve interpretation of radiological anatomy.

Keywords: Conformity index; Delineation; Interobserver variation; Machine learning; Quality assurance; SABR.