Data-Efficient Training of Pure Vision Transformers for the Task of Chest X-ray Abnormality Detection Using Knowledge Distillation

Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul:2022:1444-1447. doi: 10.1109/EMBC48229.2022.9871372.

Abstract

It is generally believed that vision transformers (ViTs) require a huge amount of data to generalize well, which limits their adoption. The introduction of data-efficient algorithms such as data-efficient image transformers (DeiT) provided an opportunity to explore the application of ViTs in medical imaging, where data scarcity is a limiting factor. In this work, we investigated the possibility of using pure transformers for the task of chest x-ray abnormality detection on a small dataset. Our proposed framework is built on a DeiT structure benefiting from a teacher-student scheme for training, with a DenseNet with strong classification performance as the teacher and an adapted ViT as the student. The results show that the performance of transformers is on par with that of convolutional neural networks (CNNs). We achieved a test accuracy of 92.2% for the task of classifying chest x-ray images (normal/pneumonia/COVID-19) on a carefully selected dataset using pure transformers. The results show the capability of transformers to accompany or replace CNNs for achieving state-of-the-art in medical imaging applications. The code and models of this work are available at https://github.com/Ouantimb-Lab/DeiTCovid.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • COVID-19* / diagnostic imaging
  • Humans
  • Neural Networks, Computer
  • Radiography
  • X-Rays