Robustness of Deep Learning Algorithm to Varying Imaging Conditions in Detecting Low Contrast Objects In Computed Tomography Phantom Images: in Comparison to 12 Radiologists

Hae Young Kim; Kyeorye Lee; Won Chang; Youngjune Kim; Sungsoo Lee; Dong Yul Oh; Leonard Sunwoo; Yoon Jin Lee; Young Hoon Kim

doi:10.3390/diagnostics11030410

Robustness of Deep Learning Algorithm to Varying Imaging Conditions in Detecting Low Contrast Objects In Computed Tomography Phantom Images: in Comparison to 12 Radiologists

Diagnostics (Basel). 2021 Feb 28;11(3):410. doi: 10.3390/diagnostics11030410.

Authors

Hae Young Kim¹, Kyeorye Lee², Won Chang¹, Youngjune Kim¹, Sungsoo Lee³, Dong Yul Oh², Leonard Sunwoo¹, Yoon Jin Lee¹, Young Hoon Kim¹

Affiliations

¹ Department of Radiology, Seoul National University Bundang Hospital, Seongnam-si, Gyeonggi-do 13620, Korea.
² Interdisciplinary Program in Bioengineering, Seoul National University, Seoul 08826, Korea.
³ PROMEDIS, Seocho-gu, Seoul 06714, Korea.

Abstract

The performance of deep learning algorithm (DLA) to that of radiologists was compared in detecting low contrast objects in CT phantom images under various imaging conditions. For training, 10,000 images were created using American College of Radiology CT phantom as the background. In half of the images, objects of 3-20 mm size and 5-30 HU contrast difference were generated in random locations. Binary responses were used as the ground truth. For testing, 640 images of Catphan^® phantom were used, half of which had objects of either 5 or 9 mm size with 10 HU contrast difference. Twelve radiologists evaluated the presence of objects on a five-point scale. The performances of the DLA and radiologists were compared across different imaging conditions in terms of area under receiver operating characteristics curve (AUC). Multi-reader multi-case AUC and Hanley and McNeil tests were used. We performed post-hoc analysis using bootstrapping and verified that the DLA is less affected by the changing imaging conditions. The AUC of DLA was consistently higher than those of the radiologists across different imaging conditions (p < 0.0001), and it was less affected by varying imaging conditions. The DLA outperformed the radiologists and showed more robust performance under varying imaging conditions.

Keywords: X-ray computed; artificial intelligence; deep learning; imaging; phantoms; tomography.

Grants and funding

NRF-2018R1C1B6007999/National Research Foundation of Korea