Learning from imbalanced COVID-19 chest X-ray (CXR) medical imaging data

Methods. 2022 Jun:202:31-39. doi: 10.1016/j.ymeth.2021.06.002. Epub 2021 Jun 4.

Abstract

The trendy task of digital medical image analysis has been continually evolving. It has been an area of prominent and growing importance from both research and deployment perspectives. Nonetheless, it is necessary to realize that the use of algorithms, methodology, as well as the source of medical image data, must be strictly scrutinized. As the COVID-19 pandemic has been gripping much of the world recently, there has been much efforts gone into developing affordable testing for the masses, and it has been shown that the established and widely available chest X-rays (CXR) images may be used as a screening criteria for assistive diagnosis purpose. Thanks to the dedicated work by various individuals and organizations, publicly available CXR of COVID-19 subjects are available for analytic usage. We have also provided a publicly available CXR dataset on the Kaggle platform. As a case study, this paper presents a systematic approach to learn from a typically imbalanced set of CXR images, which consists of a limited number of publicly available COVID-19 images. Our results show that we are able to outperform the top finishers in a related Kaggle multi-class CXR challenge. The proposed methodology should be able to help guide medical personnel in obtaining a robust diagnosis model to discern COVID-19 from other conditions confidently.

Keywords: COVID-19; Chest X-ray; Deep neural networks; Imbalanced data; Medical imaging; Transfer learning.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • COVID-19* / diagnostic imaging
  • Deep Learning*
  • Humans
  • Pandemics
  • SARS-CoV-2
  • Tomography, X-Ray Computed / methods
  • X-Rays