Deep-Learning-Based Real-Time and Automatic Target-to-Background Ratio Calculation in Fluorescence Endoscopy for Cancer Detection and Localization

Diagnostics (Basel). 2022 Aug 23;12(9):2031. doi: 10.3390/diagnostics12092031.

Abstract

Esophageal adenocarcinoma (EAC) is a deadly cancer that is rising rapidly in incidence. The early detection of EAC with curative intervention greatly improves the prognoses of patients. A scanning fiber endoscope (SFE) using fluorescence-labeled peptides that bind rapidly to epidermal growth factor receptors showed a promising performance for early EAC detection. Target-to-background (T/B) ratios were calculated to quantify the fluorescence images for neoplasia lesion classification. This T/B calculation is generally based on lesion segmentation with the Chan-Vese algorithm, which may require hyperparameter adjustment when segmenting frames with different brightness and contrasts, which impedes automation to real-time video. Deep learning models are more robust to these changes, while accurate pixel-level segmentation ground truth is challenging to establish in the medical field. Since within our dataset the ground truth contained only a frame-level diagnosis, we proposed a computer-aided diagnosis (CAD) system to calculate the T/B ratio in real time. A two-step process using convolutional neural networks (CNNs) was developed to achieve automatic suspicious frame selection and lesion segmentation for T/B calculation. In the segmentation model training for Step 2, the lesion labels were generated with a manually tuned Chan-Vese algorithm using the labeled and predicted suspicious frames from Step 1. In Step 1, we designed and trained deep CNNs to select suspicious frames using a diverse and representative set of 3427 SFE images collected from 25 patient videos from two clinical trials. We tested the models on 1039 images from 10 different SFE patient videos and achieved a sensitivity of 96.4%, a specificity of 96.6%, a precision of 95.5%, and an area under the receiver operating characteristic curve of 0.989. In Step 2, 1006 frames containing suspicious lesions were used for training for fluorescence target segmentation. The segmentation models were tested on two clinical datasets with 100 SFE frames each and achieved mean intersection-over-union values of 0.89 and 0.88, respectively. The T/B ratio calculations based on our segmentation results were similar to the manually tuned Chan-Vese algorithm, which were 1.71 ± 0.22 and 1.72 ± 0.28, respectively, with a p-value of 0.872. With the graphic processing unit (GPU), the proposed two-step CAD system achieved 50 fps for frame selection and 15 fps for segmentation and T/B calculation, which showed that the frame rejection in Step 1 improved the diagnostic efficiency. This CAD system with T/B ratio as the real-time indicator is designed to guide biopsies and surgeries and to serve as a reliable second observer to localize and outline suspicious lesions highlighted by fluorescence probes topically applied in organs where cancer originates in the epithelia.

Keywords: CAD; cancer detection; deep learning; multimodal fluorescence endoscopy; real time; segmentation.