Limited generalizability of deep learning algorithm for pediatric pneumonia classification on external data

Emerg Radiol. 2022 Feb;29(1):107-113. doi: 10.1007/s10140-021-01954-x. Epub 2021 Oct 14.

Abstract

Purpose: (1) Develop a deep learning system (DLS) to identify pneumonia in pediatric chest radiographs, and (2) evaluate its generalizability by comparing its performance on internal versus external test datasets.

Methods: Radiographs of patients between 1 and 5 years old from the Guangzhou Women and Children's Medical Center (Guangzhou dataset) and NIH ChestXray14 dataset were included. We utilized 5232 radiographs from the Guangzhou dataset to train a ResNet-50 deep convolutional neural network (DCNN) to identify pediatric pneumonia. DCNN testing was performed on a holdout set of 624 radiographs from the Guangzhou dataset (internal test set) and 383 radiographs from the NIH ChestXray14 dataset (external test set). Receiver operating characteristic curves were generated, and area under the curve (AUC) was compared via DeLong parametric method. Colored heatmaps were generated using class activation mapping (CAM) to identify important image pixels for DCNN decision-making.

Results: The DCNN achieved AUC of 0.95 and 0.54 for identifying pneumonia on internal and external test sets, respectively (p < 0.0001). Heatmaps generated by the DCNN showed the algorithm focused on clinically relevant features for images from the internal test set, but not for images from the external test set.

Conclusion: Our model had high performance when tested on an internal dataset but significantly lower accuracy when tested on an external dataset. Likewise, marked differences existed in the clinical relevance of features highlighted by heatmaps generated from internal versus external datasets. This study underscores potential limitations in the generalizability of such DLS models.

Keywords: Chest radiograph; Deep learning; Machine learning; Pneumonia.

MeSH terms

  • Algorithms
  • Child
  • Child, Preschool
  • Deep Learning*
  • Female
  • Humans
  • Infant
  • Neural Networks, Computer
  • Pneumonia* / diagnostic imaging
  • Retrospective Studies