Automatic Hip Fracture Identification and Functional Subclassification with Deep Learning

Justin D Krogue; Kaiyang V Cheng; Kevin M Hwang; Paul Toogood; Eric G Meinberg; Erik J Geiger; Musa Zaid; Kevin C McGill; Rina Patel; Jae Ho Sohn; Alexandra Wright; Bryan F Darger; Kevin A Padrez; Eugene Ozhinsky; Sharmila Majumdar; Valentina Pedoia

doi:10.1148/ryai.2020190023

Automatic Hip Fracture Identification and Functional Subclassification with Deep Learning

Radiol Artif Intell. 2020 Mar 25;2(2):e190023. doi: 10.1148/ryai.2020190023. eCollection 2020 Mar.

Authors

Affiliation

¹ Departments of Orthopaedic Surgery (J.D.K., K.M.H., P.T., E.G.M., E.J.G., M.Z.), Emergency Medicine (B.F.D., K.A.P.), and Radiology and Biomedical Imaging (K.C.M., R.P., J.H.S., A.W., E.O., S.M., V.P.), University of California, San Francisco, 6945 Geary Blvd, San Francisco, CA 94121; and Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, Calif (K.V.C.).

Abstract

Purpose: To investigate the feasibility of automatic identification and classification of hip fractures using deep learning, which may improve outcomes by reducing diagnostic errors and decreasing time to operation.

Materials and methods: Hip and pelvic radiographs from 1118 studies were reviewed, and 3026 hips were labeled via bounding boxes and classified as normal, displaced femoral neck fracture, nondisplaced femoral neck fracture, intertrochanteric fracture, previous open reduction and internal fixation, or previous arthroplasty. A deep learning-based object detection model was trained to automate the placement of the bounding boxes. A Densely Connected Convolutional Neural Network (or DenseNet) was trained on a subset of the bounding box images, and its performance was evaluated on a held-out test set and by comparison on a 100-image subset with two groups of human observers: fellowship-trained radiologists and orthopedists; senior residents in emergency medicine, radiology, and orthopedics.

Results: The binary accuracy for detecting a fracture of this model was 93.7% (95% confidence interval [CI]: 90.8%, 96.5%), with a sensitivity of 93.2% (95% CI: 88.9%, 97.1%) and a specificity of 94.2% (95% CI: 89.7%, 98.4%). Multiclass classification accuracy was 90.8% (95% CI: 87.5%, 94.2%). When compared with the accuracy of human observers, the accuracy of the model achieved an expert-level classification, at the very least, under all conditions. Additionally, when the model was used as an aid, human performance improved, with aided resident performance approximating unaided fellowship-trained expert performance in the multiclass classification.

Conclusion: A deep learning model identified and classified hip fractures with expert-level performance, at the very least, and when used as an aid, improved human performance, with aided resident performance approximating that of unaided fellowship-trained attending physicians.Supplemental material is available for this article.© RSNA, 2020.

2020 by the Radiological Society of North America, Inc.