Clinical application of radiological AI for pulmonary nodule evaluation: Replicability and susceptibility to the population shift caused by the COVID-19 pandemic

Int J Med Inform. 2023 Oct:178:105190. doi: 10.1016/j.ijmedinf.2023.105190. Epub 2023 Aug 9.

Abstract

Purpose: replicability and generalizability of medical AI are the recognized challenges that hinder a broad AI deployment in clinical practice. Pulmonary nodes detection and characterization based on chest CT images is one of the demanded use cases for automatization by means of AI, and multiple AI solutions addressing this task are becoming available. Here, we evaluated and compared the performance of several commercially available radiological AI with the same clinical task on the same external datasets acquired before and during the pandemic of COVID-19.

Approach: 5 commercially available AI models for pulmonary nodule detection were tested on two external datasets labelled by experts according to the intended clinical task. Dataset1 was acquired before the pandemic and did not contain radiological signs of COVID-19; dataset2 was collected during the pandemic and did contain radiological signs of COVID-19. ROC-analysis was applied separately for the dataset1 and dataset2 to select probability thresholds for each dataset separately. AUROC, sensitivity and specificity metrics were used to assess and compare the results of AI performance.

Results: Statistically significant differences in AUROC values were observed between the AI models for the dataset1. Whereas for the dataset2 the differences of AUROC values became statistically insignificant. Sensitivity and specificity differed statistically significantly between the AI models for the dataset1. This difference was insignificant for the dataset2 when we applied the probability threshold initially selected for the dataset1. An update of the probability threshold based on the dataset2 created statistically significant differences of sensitivity and specificity between AI models for the dataset2. For 3 out of 5 AI models, the update of the probability threshold was valuable to compensate for the degradation of AI model performances with the population shift caused by the pandemic.

Conclusions: Population shift in the data is able to deteriorate differences of AI models performance. Update of the probability threshold together with the population shift seems to be valuable to preserve AI models performance without retraining them.

Keywords: Artificial Intelligence; COVID-19; Fine-tuning; Lung Cancer; Medical Image Analysis; Replicability of medical machine learning.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • COVID-19* / diagnostic imaging
  • COVID-19* / epidemiology
  • Humans
  • Pandemics
  • Radiography
  • Radiology*
  • Tomography, X-Ray Computed