Improving convolutional neural networks performance for image classification using test time augmentation: a case study using MURA dataset

Ibrahem Kandel; Mauro Castelli

doi:10.1007/s13755-021-00163-7

Improving convolutional neural networks performance for image classification using test time augmentation: a case study using MURA dataset

Health Inf Sci Syst. 2021 Jul 31;9(1):33. doi: 10.1007/s13755-021-00163-7. eCollection 2021 Dec.

Authors

Ibrahem Kandel¹, Mauro Castelli¹

Affiliation

¹ Nova Information Management School (NOVA IMS), Universidade Nova de Lisboa, Campus de Campolide, 1070-312 Lisboa, Portugal.

Abstract

Bone fractures are one of the main causes to visit the emergency room (ER); the primary method to detect bone fractures is using X-Ray images. X-Ray images require an experienced radiologist to classify them; however, an experienced radiologist is not always available in the ER. An accurate automatic X-Ray image classifier in the ER can help reduce error rates by providing an instant second opinion to the emergency doctor. Deep learning is an emerging trend in artificial intelligence, where an automatic classifier can be trained to classify musculoskeletal images. Image augmentations techniques have proven their usefulness in increasing the deep learning model's performance. Usually, in the image classification domain, the augmentation techniques are used during training the network and not during the testing phase. Test time augmentation (TTA) can increase the model prediction by providing, with a negligible computational cost, several transformations for the same image. In this paper, we investigated the effect of TTA on image classification performance on the MURA dataset. Nine different augmentation techniques were evaluated to determine their performance compared to predictions without TTA. Two ensemble techniques were assessed as well, the majority vote and the average vote. Based on our results, TTA increased classification performance significantly, especially for models with a low score.

Keywords: Convolutional neural networks; Deep learning; Ensemble learning; Image classification; Test time augmentation; Transfer learning.