Performance of a Deep Learning Algorithm in Detecting Osteonecrosis of the Femoral Head on Digital Radiography: A Comparison With Assessments by Radiologists

AJR Am J Roentgenol. 2019 Jul;213(1):155-162. doi: 10.2214/AJR.18.20817. Epub 2019 Mar 27.

Abstract

OBJECTIVE. The objective of our study was to compare the sensitivity of a deep learning (DL) algorithm with the assessments by radiologists in diagnosing osteonecrosis of the femoral head (ONFH) using digital radiography. MATERIALS AND METHODS. We performed a two-center, retrospective, noninferiority study of consecutive patients (≥ 16 years old) with a diagnosis of ONFH based on MR images. We investigated the following four datasets of unilaterally cropped hip anteroposterior radiographs: training (n = 1346), internal validation (n = 148), temporal external test (n = 148), and geographic external test (n = 250). Diagnostic performance was measured for a DL algorithm, a less experienced radiologist, and an experienced radiologist. Noninferiority analyses for sensitivity were performed for the DL algorithm and both radiologists. Subgroup analysis for precollapse and postcollapse ONFH was done. RESULTS. Overall, 1892 hips (1037 diseased and 855 normal) were included. Sensitivity and specificity for the temporal external test set were 84.8% and 91.3% for the DL algorithm, 77.6% and 100.0% for the less experienced radiologist, and 82.4% and 100.0% for the experienced radiologist. Sensitivity and specificity for the geographic external test set were 75.2% and 97.2% for the DL algorithm, 77.6% and 75.0% for the less experienced radiologist, and 78.0% and 86.1% for the experienced radiologist. The sensitivity of the DL algorithm was noninferior to that of the assessments by both radiologists. The DL algorithm was more sensitive for precollapse ONFH than the assessment by the less experienced radiologist in the temporal external test set (75.9% vs 57.4%; 95% CI of the difference, 4.5-32.8%). CONCLUSION. The sensitivity of the DL algorithm for diagnosing ONFH using digital radiography was noninferior to that of both less experienced and experienced radiologist assessments.

Keywords: machine learning; osteonecrosis of femoral head; radiography; sensitivity.