Development and validation of open-source deep neural networks for comprehensive chest x-ray reading: a retrospective, multicentre study

Yashin Dicente Cid; Matthew Macpherson; Louise Gervais-Andre; Yuanyi Zhu; Giuseppe Franco; Ruggiero Santeramo; Chee Lim; Ian Selby; Keerthini Muthuswamy; Ashik Amlani; Heath Hopewell; Das Indrajeet; Maria Liakata; Charles E Hutchinson; Vicky Goh; Giovanni Montana

doi:10.1016/S2589-7500(23)00218-2

Development and validation of open-source deep neural networks for comprehensive chest x-ray reading: a retrospective, multicentre study

Lancet Digit Health. 2024 Jan;6(1):e44-e57. doi: 10.1016/S2589-7500(23)00218-2. Epub 2023 Dec 8.

Authors

Yashin Dicente Cid¹, Matthew Macpherson², Louise Gervais-Andre³, Yuanyi Zhu², Giuseppe Franco⁴, Ruggiero Santeramo¹, Chee Lim⁵, Ian Selby⁶, Keerthini Muthuswamy⁷, Ashik Amlani⁷, Heath Hopewell⁸, Das Indrajeet⁸, Maria Liakata⁹, Charles E Hutchinson¹⁰, Vicky Goh¹¹, Giovanni Montana¹²

Affiliations

¹ WMG, University of Warwick, Coventry, UK.
² WMG, University of Warwick, Coventry, UK; Mathematics Institute, University of Warwick, Coventry, UK.
³ School of Biomedical Engineering and Imaging Sciences, King's College London, London, UK.
⁴ Mathematics Institute, University of Warwick, Coventry, UK.
⁵ Department of Radiology, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK.
⁶ Department of Radiology, University of Cambridge, Cambridge, UK.
⁷ Department of Radiology, Guy's and St Thomas' NHS Foundation Trust, London, UK.
⁸ Department of Radiology, University Hospitals of Leicester NHS Trust, Leicester, UK.
⁹ The Alan Turing Institute, London, UK; Institute of Applied Data Science, Queen Mary University of London, London, UK.
¹⁰ Warwick Medical School, University of Warwick, Coventry, UK; Department of Radiology, University Hospitals Coventry and Warwickshire NHS Trust, Coventry, UK.
¹¹ School of Biomedical Engineering and Imaging Sciences, King's College London, London, UK; Department of Radiology, Guy's and St Thomas' NHS Foundation Trust, London, UK.
¹² WMG, University of Warwick, Coventry, UK; Department of Statistics, University of Warwick, Coventry, UK; The Alan Turing Institute, London, UK. Electronic address: g.montana@warwick.ac.uk.

PMID: 38071118
DOI: 10.1016/S2589-7500(23)00218-2

Abstract

Background: Artificial intelligence (AI) systems for automated chest x-ray interpretation hold promise for standardising reporting and reducing delays in health systems with shortages of trained radiologists. Yet, there are few freely accessible AI systems trained on large datasets for practitioners to use with their own data with a view to accelerating clinical deployment of AI systems in radiology. We aimed to contribute an AI system for comprehensive chest x-ray abnormality detection.

Methods: In this retrospective cohort study, we developed open-source neural networks, X-Raydar and X-Raydar-NLP, for classifying common chest x-ray findings from images and their free-text reports. Our networks were developed using data from six UK hospitals from three National Health Service (NHS) Trusts (University Hospitals Coventry and Warwickshire NHS Trust, University Hospitals Birmingham NHS Foundation Trust, and University Hospitals Leicester NHS Trust) collectively contributing 2 513 546 chest x-ray studies taken from a 13-year period (2006-19), which yielded 1 940 508 usable free-text radiological reports written by the contemporary assessing radiologist (collectively referred to as the "historic reporters") and 1 896 034 frontal images. Chest x-rays were labelled using a taxonomy of 37 findings by a custom-trained natural language processing (NLP) algorithm, X-Raydar-NLP, from the original free-text reports. X-Raydar-NLP was trained on 23 230 manually annotated reports and tested on 4551 reports from all hospitals. 1 694 921 labelled images from the training set and 89 238 from the validation set were then used to train a multi-label image classifier. Our algorithms were evaluated on three retrospective datasets: a set of exams sampled randomly from the full NHS dataset reported during clinical practice and annotated using NLP (n=103 328); a consensus set sampled from all six hospitals annotated by three expert radiologists (two independent annotators for each image and a third consultant to facilitate disagreement resolution) under research conditions (n=1427); and an independent dataset, MIMIC-CXR, consisting of NLP-annotated exams (n=252 374).

Findings: X-Raydar achieved a mean AUC of 0·919 (SD 0·039) on the auto-labelled set, 0·864 (0·102) on the consensus set, and 0·842 (0·074) on the MIMIC-CXR test, demonstrating similar performance to the historic clinical radiologist reporters, as assessed on the consensus set, for multiple clinically important findings, including pneumothorax, parenchymal opacification, and parenchymal mass or nodules. On the consensus set, X-Raydar outperformed historical reporter balanced accuracy with significance on 27 of 37 findings, was non-inferior on nine, and inferior on one finding, resulting in an average improvement of 13·3% (SD 13·1) to 0·763 (0·110), including a mean 5·6% (13·2) improvement in critical findings to 0·826 (0·119).

Interpretation: Our study shows that automated classification of chest x-rays under a comprehensive taxonomy can achieve performance levels similar to those of historical reporters and exhibit robust generalisation to external data. The open-sourced neural networks can serve as foundation models for further research and are freely available to the research community.

Funding: Wellcome Trust.

Publication types

Multicenter Study

MeSH terms

Artificial Intelligence*
Humans
Image Interpretation, Computer-Assisted*
Neural Networks, Computer*
Retrospective Studies
X-Rays