Assessing the speed-accuracy trade-offs of popular convolutional neural networks for single-crop rib fracture classification

Riel Castro-Zunti; Kum Ju Chae; Younhee Choi; Gong Yong Jin; Seok-Bum Ko

doi:10.1016/j.compmedimag.2021.101937

Assessing the speed-accuracy trade-offs of popular convolutional neural networks for single-crop rib fracture classification

Comput Med Imaging Graph. 2021 Jul:91:101937. doi: 10.1016/j.compmedimag.2021.101937. Epub 2021 May 15.

Authors

Riel Castro-Zunti¹, Kum Ju Chae², Younhee Choi¹, Gong Yong Jin², Seok-Bum Ko³

Affiliations

¹ Department of Electrical and Computer Engineering, University of Saskatchewan, Saskatoon, SK S7N 5C9, Canada.
² Department of Radiology, Chonbuk National University Hospital, 20 Geonji-ro, Geumam 2(i)-dong, Deokjin-gu, Jeonju, Jeollabuk-do 54907, South Korea.
³ Department of Electrical and Computer Engineering, University of Saskatchewan, Saskatoon, SK S7N 5C9, Canada. Electronic address: seokbum.ko@usask.ca.

PMID: 34087611
DOI: 10.1016/j.compmedimag.2021.101937

Abstract

Rib fractures are injuries commonly assessed in trauma wards. Deep learning has demonstrated state-of-the-art accuracy for a variety of tasks, including image classification. This paper assesses the speed-accuracy trade-offs and general suitability of four popular convolutional neural networks to classify rib fractures from axial computed tomography imagery. We transfer learned InceptionV3, ResNet50, MobileNetV2, and VGG16 models, additionally training "decomposed" models comprised of taking only the first n blocks for each block for each architecture. Given that acute (new) fractures are generally most important to detect, we trained two types of models: a classful model with classes acute, old (healed), and normal (non-fractured); and a binary model with acute vs. the other classes. We found that the first 7 blocks of InceptionV3 achieved the best results and general speed-accuracy trade-off. The classful model achieved a 5-fold cross-validation average accuracy and macro recall of 96.00% and 94.0%, respectively. The binary model achieved a 5-fold cross-validation average accuracy, macro recall, and area under receiver operator characteristic curve of 97.76%, 94.6%, and 94.7%, respectively. On a Windows 10 PC with 32GB RAM and an Nvidia 1080ti GPU, the model's average CPU and GPU per-crop inference times were 13.6 and 12.2 ms, respectively. Compared to the InceptionV3 Block 7 classful model, a radiologist with 9 years of experience was less accurate but more sensitive to acute fractures; meanwhile, the deep learning model had fewer false positive diagnoses and better sensitivity to old fractures and normal ribs. The Cohen's Kappa between the two was 0.813.

Keywords: Convolutional neural networks; Deep learning; Radiology; Rib fractures.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Deep Learning*
Humans
Neural Networks, Computer
Research Design
Rib Fractures* / diagnostic imaging
Tomography, X-Ray Computed