Predicting Patient Demographics From Chest Radiographs With Deep Learning

J Am Coll Radiol. 2022 Oct;19(10):1151-1161. doi: 10.1016/j.jacr.2022.06.008. Epub 2022 Aug 11.

Abstract

Background: Deep learning models are increasingly informing medical decision making, for instance, in the detection of acute intracranial hemorrhage and pulmonary embolism. However, many models are trained on medical image databases that poorly represent the diversity of the patients they serve. In turn, many artificial intelligence models may not perform as well on assisting providers with important medical decisions for underrepresented populations.

Purpose: Assessment of the ability of deep learning models to classify the self-reported gender, age, self-reported ethnicity, and insurance status of an individual patient from a given chest radiograph.

Methods: Models were trained and tested with 55,174 radiographs in the MIMIC Chest X-ray (MIMIC-CXR) database. External validation data came from two separate databases, one from CheXpert and another from a multihospital urban health care system after institutional review board approval. Macro-averaged area under the curve (AUC) values were used to evaluate performance of models. Code used for this study is open-source and available at https://github.com/ai-bias/cxr-bias, and pixelstopatients.com/models/demographics.

Results: Accuracy of models to predict gender was nearly perfect, with 0.999 (95% confidence interval: 0.99-0.99) AUC on held-out test data and 0.994 (0.99-0.99) and 0.997 (0.99-0.99) on external validation data. There was high accuracy to predict age and ethnicity, ranging from 0.854 (0.80-0.91) to 0.911 (0.88-0.94) AUC, and moderate accuracy to predict insurance status, with AUC ranging from 0.705 (0.60-0.81) on held-out test data to 0.675 (0.54-0.79) on external validation data.

Conclusions: Deep learning models can predict the age, self-reported gender, self-reported ethnicity, and insurance status of a patient from a chest radiograph. Visualization techniques are useful to ensure deep learning models function as intended and to demonstrate anatomical regions of interest. These models can be used to ensure that training data are diverse, thereby ensuring artificial intelligence models that work on diverse populations.

Keywords: AI bias; artificial intelligence; chest radiographs; data science.

MeSH terms

  • Artificial Intelligence
  • Deep Learning*
  • Ethnicity
  • Humans
  • Radiography
  • Radiography, Thoracic / methods