Annotations of Lung Abnormalities in Shenzhen Chest X-ray Dataset for Computer-Aided Screening of Pulmonary Diseases

Data (Basel). 2022 Jul;7(7):95. doi: 10.3390/data7070095. Epub 2022 Jul 13.

Abstract

Developments in deep learning techniques have led to significant advances in automated abnormality detection in radiological images and paved the way for their potential use in computer-aided diagnosis (CAD) systems. However, the development of CAD systems for pulmonary tuberculosis (TB) diagnosis is hampered by the lack of training data that is of good visual and diagnostic quality, of sufficient size, variety, and, where relevant, containing fine region annotations. This study presents a collection of annotations/segmentations of pulmonary radiological manifestations that are consistent with TB in the publicly available and widely used Shenzhen chest X-ray (CXR) dataset made available by the U.S. National Library of Medicine and obtained via a research collaboration with No. 3. People's Hospital Shenzhen, China. The goal of releasing these annotations is to advance the state-of-the-art for image segmentation methods toward improving the performance of fine-grained segmentation of TB-consistent findings in digital Chest X-ray images. The annotation collection comprises the following: 1) annotation files in JSON (JavaScript Object Notation) format that indicate locations and shapes of 19 lung pattern abnormalities for 336 TB patients; 2) mask files saved in PNG format for each abnormality per TB patient; 3) a CSV (comma-separated values) file that summarizes lung abnormality types and numbers per TB patient. To the best of our knowledge, this is the first collection of pixel-level annotations of TB-consistent findings in CXRs. Dataset: https://data.lhncbc.nlm.nih.gov/public/Tuberculosis-Chest-X-ray-Datasets/Shenzhen-Hospital-CXR-Set/Annotations/index.html.

Keywords: Tuberculosis (TB); abnormalities; annotations; chest X-ray (CXR) images; computer-aided diagnosis.