Development and multi-institutional validation of a deep learning model for grading of vesicoureteral reflux on voiding cystourethrogram: a retrospective multicenter study

EClinicalMedicine. 2024 Feb 9:69:102466. doi: 10.1016/j.eclinm.2024.102466. eCollection 2024 Mar.

Abstract

Background: Voiding cystourethrography (VCUG) is the gold standard for the diagnosis and grading of vesicoureteral reflux (VUR). However, VUR grading from voiding cystourethrograms is highly subjective with low reliability. This study aimed to develop a deep learning model to improve reliability for VUR grading on VCUG and compare its performance to that of clinicians.

Methods: In this retrospective study in China, VCUG images were collected between January 2019 and September 2022 from our institution as an internal dataset for training and 4 external data sets as external testing set for validation. Samples were divided into training (N = 1000) and validation sets (N = 500), internal testing set (N = 168), and external testing set (N = 280). An ensemble learning-based model, Deep-VCUG, using Res-Net 101 and the voting methods was developed to predict VUR grade. The grading performance was assessed using heatmaps, area under the receiver operating characteristic curve (AUC), sensitivity, specificity, accuracy, and F1 score in the internal and external testing set. The performances of four clinicians (2 pediatric urologists and 2 radiologists) with and without the Deep-VCUG assisted to predict VUR grade were explored in external testing sets.

Findings: A total of 1948 VCUG images were collected (Internal dataset = 1668; multi-center external dataset = 280). For assessing unilateral VUR grading, the Deep-VCUG achieved AUCs of 0.962 (95% confidence interval [CI]: 0.943-0.978) and 0.944 (95% [CI]: 0.921-0.964) in the internal and external testing sets, respectively, for bilateral VUR grading, the Deep-VCUG also achieved high AUCs of 0.960 (95% [CI]: 0.922-0.983) and 0.924 (95% [CI]: 0.887-0.957). The Deep-VCUG model using voting method outperformed single model and clinician in terms of classification based on VCUG image. Moreover, Under the Dee-VCUG assisted, the classification ability of junior and senior clinicians was significantly improved.

Interpretation: The Deep-VCUG model is a generalizable, objective, and accurate tool for vesicoureteral reflux grading based on VCUG imaging and had good assistance with clinicians to VUR grading applicability.

Funding: This study was supported by Natural Science Foundation of China, "Fuqing Scholar" Student Scientific Research Program of Shanghai Medical College, Fudan University, and the Program of Greater Bay Area Institute of Precision Medicine (Guangzhou).

Keywords: Bilateral vesicoureteral reflux; Deep learning; Ensemble learning; Voiding cystourethrography.