STERN: Attention-driven Spatial Transformer Network for abnormality detection in chest X-ray images

Artif Intell Med. 2024 Jan:147:102737. doi: 10.1016/j.artmed.2023.102737. Epub 2023 Nov 30.

Abstract

Chest X-ray scans are frequently requested to detect the presence of abnormalities, due to their low-cost and non-invasive nature. The interpretation of these images can be automated to prioritize more urgent exams through deep learning models, but the presence of image artifacts, e.g. lettering, often generates a harmful bias in the classifiers and an increase of false positive results. Consequently, healthcare would benefit from a system that selects the thoracic region of interest prior to deciding whether an image is possibly pathologic. The current work tackles this binary classification exercise, in which an image is either normal or abnormal, using an attention-driven and spatially unsupervised Spatial Transformer Network (STERN), that takes advantage of a novel domain-specific loss to better frame the region of interest. Unlike the state of the art, in which this type of networks is usually employed for image alignment, this work proposes a spatial transformer module that is used specifically for attention, as an alternative to the standard object detection models that typically precede the classifier to crop out the region of interest. In sum, the proposed end-to-end architecture dynamically scales and aligns the input images to maximize the classifier's performance, by selecting the thorax with translation and non-isotropic scaling transformations, and thus eliminating artifacts. Additionally, this paper provides an extensive and objective analysis of the selected regions of interest, by proposing a set of mathematical evaluation metrics. The results indicate that the STERN achieves similar results to using YOLO-cropped images, with reduced computational cost and without the need for localization labels. More specifically, the system is able to distinguish abnormal frontal images from the CheXpert dataset, with a mean AUC of 85.67% - a 2.55% improvement vs. the 0.98% improvement achieved by the YOLO-based counterpart in comparison to a standard baseline classifier. At the same time, the STERN approach requires less than 2/3 of the training parameters, while increasing the inference time per batch in less than 2 ms. Code available via GitHub.

Keywords: Attention module; Binary classification; Deep learning; Object detection; Pathology; Thorax.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artifacts*
  • Exercise
  • Thorax*
  • X-Rays