Multi-center validation of an artificial intelligence system for detection of COVID-19 on chest radiographs in symptomatic patients

Michael D Kuo; Keith W H Chiu; David S Wang; Anna Rita Larici; Dmytro Poplavskiy; Adele Valentini; Alessandro Napoli; Andrea Borghesi; Guido Ligabue; Xin Hao B Fang; Hing Ki C Wong; Sailong Zhang; John R Hunter; Abeer Mousa; Amato Infante; Lorenzo Elia; Salvatore Golemi; Leung Ho P Yu; Christopher K M Hui; Bradley J Erickson

doi:10.1007/s00330-022-08969-z

Multi-center validation of an artificial intelligence system for detection of COVID-19 on chest radiographs in symptomatic patients

Eur Radiol. 2023 Jan;33(1):23-33. doi: 10.1007/s00330-022-08969-z. Epub 2022 Jul 2.

Authors

Michael D Kuo^{1

2}, Keith W H Chiu³, David S Wang⁴, Anna Rita Larici^{5

6}, Dmytro Poplavskiy⁷, Adele Valentini⁸, Alessandro Napoli⁹, Andrea Borghesi¹⁰, Guido Ligabue^{11

12}, Xin Hao B Fang¹³, Hing Ki C Wong¹⁴, Sailong Zhang³, John R Hunter⁴, Abeer Mousa¹⁵, Amato Infante^{6

16}, Lorenzo Elia^{5

6}, Salvatore Golemi¹⁰, Leung Ho P Yu¹⁷, Christopher K M Hui^{18

19}, Bradley J Erickson¹⁵

Affiliations

¹ Medical Artificial Intelligence Laboratory Program, Department of Diagnostic Radiology, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China. mikedkuo@gmail.com.
² Ensemble Group Holdings, Ensemblehealth.ai, Scottsdale, AZ, USA. mikedkuo@gmail.com.
³ Medical Artificial Intelligence Laboratory Program, Department of Diagnostic Radiology, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China.
⁴ Department of Radiology, Stanford Health Care, Stanford, CA, USA.
⁵ Section of Radiology, Department of Radiological and Hematological Sciences, Università Cattolica del Sacro Cuore, Rome, Italy.
⁶ Department of Diagnostic Imaging, Oncological Radiotherapy and Hematology, Fondazione Policlinico Universitario "A. Gemelli" IRCCS, Rome, Italy.
⁷ Ensemble Group Holdings, Ensemblehealth.ai, Scottsdale, AZ, USA.
⁸ Department of Radiology, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy.
⁹ Department of Radiological, Oncological and Pathological Sciences, Sapienza University of Rome, Rome, Italy.
¹⁰ Department of Medical and Surgical Specialties, Radiological Sciences and Public Health, University of Brescia, ASST Spedali Civili of Brescia, Brescia, Italy.
¹¹ Department of Medical and Surgical Sciences for Children & Adults, Modena and Reggio Emilia University, Modena, Italy.
¹² Division of Radiology, Azienda Ospedaliero-Universitaria Policlinico di Modena, Modena, Italy.
¹³ Radiology Department, Queen Mary Hospital, Hong Kong SAR, China.
¹⁴ Radiology Department, United Christian Hospital, Hong Kong SAR, China.
¹⁵ Radiology Department, Mayo Clinic, Rochester, MN, USA.
¹⁶ Columbus Covid 2 Hospital, Rome, Italy.
¹⁷ Department of Mathematics and Information Technology, The Education University of Hong Kong, Hong Kong SAR, China.
¹⁸ Department of Medicine, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China.
¹⁹ Department of Respiratory & Critical Care Medicine, Matilda & War Memorial Hospital, Hong Kong SAR, China.

PMID: 35779089
DOI: 10.1007/s00330-022-08969-z

Abstract

Objectives: While chest radiograph (CXR) is the first-line imaging investigation in patients with respiratory symptoms, differentiating COVID-19 from other respiratory infections on CXR remains challenging. We developed and validated an AI system for COVID-19 detection on presenting CXR.

Methods: A deep learning model (RadGenX), trained on 168,850 CXRs, was validated on a large international test set of presenting CXRs of symptomatic patients from 9 study sites (US, Italy, and Hong Kong SAR) and 2 public datasets from the US and Europe. Performance was measured by area under the receiver operator characteristic curve (AUC). Bootstrapped simulations were performed to assess performance across a range of potential COVID-19 disease prevalence values (3.33 to 33.3%). Comparison against international radiologists was performed on an independent test set of 852 cases.

Results: RadGenX achieved an AUC of 0.89 on 4-fold cross-validation and an AUC of 0.79 (95%CI 0.78-0.80) on an independent test cohort of 5,894 patients. Delong's test showed statistical differences in model performance across patients from different regions (p < 0.01), disease severity (p < 0.001), gender (p < 0.001), and age (p = 0.03). Prevalence simulations showed the negative predictive value increases from 86.1% at 33.3% prevalence, to greater than 98.5% at any prevalence below 4.5%. Compared with radiologists, McNemar's test showed the model has higher sensitivity (p < 0.001) but lower specificity (p < 0.001).

Conclusion: An AI model that predicts COVID-19 infection on CXR in symptomatic patients was validated on a large international cohort providing valuable context on testing and performance expectations for AI systems that perform COVID-19 prediction on CXR.

Key points: • An AI model developed using CXRs to detect COVID-19 was validated in a large multi-center cohort of 5,894 patients from 9 prospectively recruited sites and 2 public datasets. • Differences in AI model performance were seen across region, disease severity, gender, and age. • Prevalence simulations on the international test set demonstrate the model's NPV is greater than 98.5% at any prevalence below 4.5%.

Keywords: Artificial intelligence; COVID-19; Public health; Radiology; Thoracic.

MeSH terms

Artificial Intelligence
COVID-19*
Deep Learning*
Humans
Radiography, Thoracic / methods
Retrospective Studies
Tomography, X-Ray Computed / methods