Content-based image retrieval with a Convolutional Siamese Neural Network: Distinguishing lung cancer and tuberculosis in CT images

Comput Biol Med. 2022 Jan:140:105096. doi: 10.1016/j.compbiomed.2021.105096. Epub 2021 Nov 30.

Abstract

Background: CT findings of lung cancer and tuberculosis are sometimes similar, potentially leading to misdiagnosis. This study aims to combine deep learning and content-based image retrieval (CBIR) to distinguish lung cancer (LC) from nodular/mass atypical tuberculosis (NMTB) in CT images.

Methods: This study proposes CBIR with a convolutional Siamese neural network (CBIR-CSNN). First, the lesion patches are cropped out to compose LC and NMTB datasets and the pairs of two arbitrary patches form a patch-pair dataset. Second, this patch-pair dataset is utilized to train a CSNN. Third, a test patch is treated as a query. The distance between this query and 20 patches in both datasets is calculated using the trained CSNN. The patches closest to the query are used to give the final prediction by majority voting. One dataset of 719 patients is used to train and test the CBIR-CSNN. Another external dataset with 30 patients is employed to verify CBIR-CSNN.

Results: The CBIR-CSNN achieves excellent performance at the patch level with an mAP (Mean Average Precision) of 0.953, an accuracy of 0.947, and an area under the curve (AUC) of 0.970. At the patient level, the CBIR-CSNN correctly predicted all labels. In the external dataset, the CBIR-CSNN has an accuracy of 0.802 and AUC of 0.858 at the patch level, and 0.833 and 0.902 at the patient level.

Conclusions: This CBIR-CSNN can accurately and automatically distinguish LC from NMTB using CT images. CBIR-CSNN has excellent representation capability, compatibility with few-shot learning, and visual explainability.

Keywords: Content-based imaging retrieval; Lung cancer; Nodular/mass atypical pulmonary tuberculosis; Siamese network.