Validation of algorithmic CT image quality metrics with preferences of radiologists

Med Phys. 2019 Nov;46(11):4837-4846. doi: 10.1002/mp.13795. Epub 2019 Sep 20.

Abstract

Purpose: Automated assessment of perceptual image quality on clinical Computed Tomography (CT) data by computer algorithms has the potential to greatly facilitate data-driven monitoring and optimization of CT image acquisition protocols. The application of these techniques in clinical operation requires the knowledge of how the output of the computer algorithms corresponds to clinical expectations. This study addressed the need to validate algorithmic image quality measurements on clinical CT images with preferences of radiologists and determine the clinically acceptable range of algorithmic measurements for abdominal CT examinations.

Materials and methods: Algorithmic measurements of image quality metrics (organ HU, noise magnitude, and clarity) were performed on a clinical CT image dataset with supplemental measures of noise power spectrum from phantom images using techniques developed previously. The algorithmic measurements were compared to clinical expectations of image quality in an observer study with seven radiologists. Sets of CT liver images were selected from the dataset where images in the same set varied in terms of one metric at a time. These sets of images were shown via a web interface to one observer at a time. First, the observer rank ordered the CT images in a set according to his/her preference for the varying metric. The observer then selected his/her preferred acceptable range of the metric within the ranked images. The agreement between algorithmic and observer rankings of image quality were investigated and the clinically acceptable image quality in terms of algorithmic measurements were determined.

Results: The overall rank-order agreements between algorithmic and observer assessments were 0.90, 0.98, and 1.00 for noise magnitude, liver parenchyma HU, and clarity, respectively. The results indicate a strong agreement between the algorithmic and observer assessments of image quality. Clinically acceptable thresholds (median) of algorithmic metric values were (17.8, 32.6) HU for noise magnitude, (92.1, 131.9) for liver parenchyma HU, and (0.47, 0.52) for clarity.

Conclusions: The observer study results indicated that these algorithms can robustly assess the perceptual quality of clinical CT images in an automated fashion. Clinically acceptable ranges of algorithmic measurements were determined. The correspondence of these image quality assessment algorithms to clinical expectations paves the way toward establishing diagnostic reference levels in terms of clinically acceptable perceptual image quality and data-driven optimization of CT image acquisition protocols.

Keywords: CT image quality; automated assessment; clinical images; diagnostic reference level; observer study; validation of algorithmic metrics.

Publication types

  • Validation Study

MeSH terms

  • Algorithms*
  • Humans
  • Image Processing, Computer-Assisted / methods*
  • Quality Control
  • Radiologists*
  • Tomography, X-Ray Computed*

Grants and funding