Virtual Raters for Reproducible and Objective Assessments in Radiology

Jens Kleesiek; Jens Petersen; Markus Döring; Klaus Maier-Hein; Ullrich Köthe; Wolfgang Wick; Fred A Hamprecht; Martin Bendszus; Armin Biller

doi:10.1038/srep25007

Virtual Raters for Reproducible and Objective Assessments in Radiology

Sci Rep. 2016 Apr 27:6:25007. doi: 10.1038/srep25007.

Authors

Jens Kleesiek^{1

2

3

4}, Jens Petersen^{1

2}, Markus Döring¹, Klaus Maier-Hein², Ullrich Köthe³, Wolfgang Wick⁵, Fred A Hamprecht³, Martin Bendszus¹, Armin Biller^{1

4}

Affiliations

¹ University of Heidelberg, Department of Neuroradiology, Heidelberg, Germany.
² German Cancer Research Center, Junior Group Medical Image Computing, Heidelberg, Germany.
³ University of Heidelberg, HCI/IWR, Heidelberg, Germany.
⁴ German Cancer Research Center, Division of Radiology, Heidelberg, Germany.
⁵ University of Heidelberg, Department of Neurology, Heidelberg, Germany.

Abstract

Volumetric measurements in radiologic images are important for monitoring tumor growth and treatment response. To make these more reproducible and objective we introduce the concept of virtual raters (VRs). A virtual rater is obtained by combining knowledge of machine-learning algorithms trained with past annotations of multiple human raters with the instantaneous rating of one human expert. Thus, he is virtually guided by several experts. To evaluate the approach we perform experiments with multi-channel magnetic resonance imaging (MRI) data sets. Next to gross tumor volume (GTV) we also investigate subcategories like edema, contrast-enhancing and non-enhancing tumor. The first data set consists of N = 71 longitudinal follow-up scans of 15 patients suffering from glioblastoma (GB). The second data set comprises N = 30 scans of low- and high-grade gliomas. For comparison we computed Pearson Correlation, Intra-class Correlation Coefficient (ICC) and Dice score. Virtual raters always lead to an improvement w.r.t. inter- and intra-rater agreement. Comparing the 2D Response Assessment in Neuro-Oncology (RANO) measurements to the volumetric measurements of the virtual raters results in one-third of the cases in a deviating rating. Hence, we believe that our approach will have an impact on the evaluation of clinical studies as well as on routine imaging diagnostics.

Publication types

Evaluation Study
Research Support, Non-U.S. Gov't

MeSH terms

Glioma / diagnostic imaging*
Humans
Longitudinal Studies
Machine Learning
Neoplasm Grading / methods*
Radiology / methods*