Machine and Deep Learning for Tuberculosis Detection on Chest X-Rays: Systematic Literature Review

Seng Hansun; Ahmadreza Argha; Siaw-Teng Liaw; Branko G Celler; Guy B Marks

doi:10.2196/43154

Machine and Deep Learning for Tuberculosis Detection on Chest X-Rays: Systematic Literature Review

J Med Internet Res. 2023 Jul 3:25:e43154. doi: 10.2196/43154.

Authors

Seng Hansun^{1

2}, Ahmadreza Argha^{3

4

5}, Siaw-Teng Liaw⁶, Branko G Celler⁷, Guy B Marks^{1

2}

Affiliations

¹ South West Sydney (SWS), School of Clinical Medicine, University of New South Wales, Sydney, Australia.
² Woolcock Vietnam Research Group, Woolcock Institute of Medical Research, Sydney, Australia.
³ Graduate School of Biomedical Engineering, University of New South Wales, Sydney, Australia.
⁴ Tyree Institute of Health Engineering (IHealthE), University of New South Wales, Sydney, Australia.
⁵ Ageing Future Institute (AFI), University of New South Wales, Sydney, Australia.
⁶ WHO Collaborating Centre (eHealth), School of Population Health, University of New South Wales, Sydney, Australia.
⁷ Biomedical Systems Research Laboratory, School of Electrical Engineering and Telecommunications, University of New South Wales, Sydney, Australia.

PMID: 37399055
PMCID: PMC10365622
DOI: 10.2196/43154

Abstract

Background: Tuberculosis (TB) was the leading infectious cause of mortality globally prior to COVID-19 and chest radiography has an important role in the detection, and subsequent diagnosis, of patients with this disease. The conventional experts reading has substantial within- and between-observer variability, indicating poor reliability of human readers. Substantial efforts have been made in utilizing various artificial intelligence-based algorithms to address the limitations of human reading of chest radiographs for diagnosing TB.

Objective: This systematic literature review (SLR) aims to assess the performance of machine learning (ML) and deep learning (DL) in the detection of TB using chest radiography (chest x-ray [CXR]).

Methods: In conducting and reporting the SLR, we followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. A total of 309 records were identified from Scopus, PubMed, and IEEE (Institute of Electrical and Electronics Engineers) databases. We independently screened, reviewed, and assessed all available records and included 47 studies that met the inclusion criteria in this SLR. We also performed the risk of bias assessment using Quality Assessment of Diagnostic Accuracy Studies version 2 (QUADAS-2) and meta-analysis of 10 included studies that provided confusion matrix results.

Results: Various CXR data sets have been used in the included studies, with 2 of the most popular ones being Montgomery County (n=29) and Shenzhen (n=36) data sets. DL (n=34) was more commonly used than ML (n=7) in the included studies. Most studies used human radiologist's report as the reference standard. Support vector machine (n=5), k-nearest neighbors (n=3), and random forest (n=2) were the most popular ML approaches. Meanwhile, convolutional neural networks were the most commonly used DL techniques, with the 4 most popular applications being ResNet-50 (n=11), VGG-16 (n=8), VGG-19 (n=7), and AlexNet (n=6). Four performance metrics were popularly used, namely, accuracy (n=35), area under the curve (AUC; n=34), sensitivity (n=27), and specificity (n=23). In terms of the performance results, ML showed higher accuracy (mean ~93.71%) and sensitivity (mean ~92.55%), while on average DL models achieved better AUC (mean ~92.12%) and specificity (mean ~91.54%). Based on data from 10 studies that provided confusion matrix results, we estimated the pooled sensitivity and specificity of ML and DL methods to be 0.9857 (95% CI 0.9477-1.00) and 0.9805 (95% CI 0.9255-1.00), respectively. From the risk of bias assessment, 17 studies were regarded as having unclear risks for the reference standard aspect and 6 studies were regarded as having unclear risks for the flow and timing aspect. Only 2 included studies had built applications based on the proposed solutions.

Conclusions: Findings from this SLR confirm the high potential of both ML and DL for TB detection using CXR. Future studies need to pay a close attention on 2 aspects of risk of bias, namely, the reference standard and the flow and timing aspects.

Trial registration: PROSPERO CRD42021277155; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=277155.

Keywords: PRISMA guidelines; QUADAS-2; chest x-rays; convolutional neural networks; diagnostic test accuracy; machine and deep learning; risk of bias; sensitivity and specificity; systematic literature review; tuberculosis detection.

©Seng Hansun, Ahmadreza Argha, Siaw-Teng Liaw, Branko G Celler, Guy B Marks. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 03.07.2023.

Publication types

Meta-Analysis
Systematic Review

MeSH terms

Artificial Intelligence
COVID-19*
Deep Learning*
Humans
Radiography
Reproducibility of Results
Tuberculosis* / diagnosis
X-Rays