Variability of Grading DR Screening Images among Non-Trained Retina Specialists

J Clin Med. 2022 May 31;11(11):3125. doi: 10.3390/jcm11113125.

Abstract

Poland has never had a widespread diabetic retinopathy (DR) screening program and subsequently has no purpose-trained graders and no established grader training scheme. Herein, we compare the performance and variability of three retinal specialists with no additional DR grading training in assessing images from 335 real-life screening encounters and contrast their performance against IDx-DR, a US Food and Drug Administration (FDA) approved DR screening suite. A total of 1501 fundus images from 670 eyes were assessed by each grader with a final grade on a per-eye level. Unanimous agreement between all graders was achieved for 385 eyes, and 110 patients, out of which 98% had a final grade of no DR. Thirty-six patients had final grades higher than mild DR, out of which only two had no grader disagreements regarding severity. A total of 28 eyes underwent adjudication due to complete grader disagreement. Four patients had discordant grades ranging from no DR to severe DR between the human graders and IDx-DR. Retina specialists achieved kappa scores of 0.52, 0.78, and 0.61. Retina specialists had relatively high grader variability and only a modest concordance with IDx-DR results. Focused training and verification are recommended for any potential DR graders before assessing DR screening images.

Keywords: deep learning; diabetic retinopathy grading; diabetic retinopathy screening; grader comparison; inter-grader variability.

Grants and funding

This research received no external funding.