Deep learning detection of diabetic retinopathy in Scotland's diabetic eye screening programme

Alan D Fleming; Joseph Mellor; Stuart J McGurnaghan; Luke A K Blackbourn; Keith A Goatman; Caroline Styles; Amos J Storkey; Paul M McKeigue; Helen M Colhoun

doi:10.1136/bjo-2023-323395

Deep learning detection of diabetic retinopathy in Scotland's diabetic eye screening programme

Br J Ophthalmol. 2023 Sep 13:bjo-2023-323395. doi: 10.1136/bjo-2023-323395. Online ahead of print.

Authors

Affiliations

¹ The Institute of Genetics and Cancer, University of Edinburgh Western General Hospital, Edinburgh, UK.
² Usher Institute, The University of Edinburgh, Edinburgh, UK joe.mellor@ed.ac.uk.
³ King's College, Aberdeen, UK.
⁴ Queen Margaret Hospital, NHS Fife, Dunfermline, Fife, UK.
⁵ School of Informatics, The University of Edinburgh, Edinburgh, UK.
⁶ Usher Institute, The University of Edinburgh, Edinburgh, UK.

^# Contributed equally.

PMID: 37704266
DOI: 10.1136/bjo-2023-323395

Abstract

Background/aims: Support vector machine-based automated grading (known as iGradingM) has been shown to be safe, cost-effective and robust in the diabetic retinopathy (DR) screening (DES) programme in Scotland. It triages screening episodes as gradable with no DR versus manual grading required. The study aim was to develop a deep learning-based autograder using images and gradings from DES and to compare its performance with that of iGradingM.

Methods: Retinal images, quality assurance (QA) data and routine DR grades were obtained from national datasets in 179 944 patients for years 2006-2016. QA grades were available for 744 images. We developed a deep learning-based algorithm to detect whether either eye contained ungradable images or any DR. The sensitivity and specificity were evaluated against consensus QA grades and routine grades.

Results: Images used in QA which were ungradable or with DR were detected by deep learning with better specificity compared with manual graders (p<0.001) and with iGradingM (p<0.001) at the same sensitivities. Any DR according to the DES final grade was detected with 89.19% (270 392/303 154) sensitivity and 77.41% (500 945/647 158) specificity. Observable disease and referable disease were detected with sensitivities of 96.58% (16 613/17 201) and 98.48% (22 600/22 948), respectively. Overall, 43.84% of screening episodes would require manual grading.

Conclusion: A deep learning-based system for DR grading was evaluated in QA data and images from 11 years in 50% of people attending a national DR screening programme. The system could reduce the manual grading workload at the same sensitivity compared with the current automated grading system.

Keywords: Imaging; Public health; Retina; Telemedicine.