Self-evolving vision transformer for chest X-ray diagnosis through knowledge distillation

Sangjoon Park; Gwanghyun Kim; Yujin Oh; Joon Beom Seo; Sang Min Lee; Jin Hwan Kim; Sungjun Moon; Jae-Kwang Lim; Chang Min Park; Jong Chul Ye

doi:10.1038/s41467-022-31514-x

Self-evolving vision transformer for chest X-ray diagnosis through knowledge distillation

Nat Commun. 2022 Jul 4;13(1):3848. doi: 10.1038/s41467-022-31514-x.

Authors

Sangjoon Park¹, Gwanghyun Kim¹, Yujin Oh¹, Joon Beom Seo², Sang Min Lee², Jin Hwan Kim³, Sungjun Moon⁴, Jae-Kwang Lim⁵, Chang Min Park⁶, Jong Chul Ye^{7

8}

Affiliations

¹ Department of Bio and Brain Engineering, KAIST, Daejeon, Korea.
² Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea.
³ College of Medicine, Chungnam National Univerity, Daejeon, South Korea.
⁴ College of Medicine, Yeungnam University, Daegu, South Korea.
⁵ School of Medicine, Kyungpook National University, Daegu, South Korea.
⁶ College of Medicine, Seoul National University, Seoul, South Korea.
⁷ Department of Bio and Brain Engineering, KAIST, Daejeon, Korea. jong.ye@kaist.ac.kr.
⁸ Kim Jaechul Graduate School of AI, KAIST, Daejeon, Korea. jong.ye@kaist.ac.kr.

Abstract

Although deep learning-based computer-aided diagnosis systems have recently achieved expert-level performance, developing a robust model requires large, high-quality data with annotations that are expensive to obtain. This situation poses a conundrum that annually-collected chest x-rays cannot be utilized due to the absence of labels, especially in deprived areas. In this study, we present a framework named distillation for self-supervision and self-train learning (DISTL) inspired by the learning process of the radiologists, which can improve the performance of vision transformer simultaneously with self-supervision and self-training through knowledge distillation. In external validation from three hospitals for diagnosis of tuberculosis, pneumothorax, and COVID-19, DISTL offers gradually improved performance as the amount of unlabeled data increase, even better than the fully supervised model with the same amount of labeled data. We additionally show that the model obtained with DISTL is robust to various real-world nuisances, offering better applicability in clinical setting.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
COVID-19* / diagnostic imaging
Diagnosis, Computer-Assisted
Humans
Radiography
X-Rays