Variability of Grading DR Screening Images among Non-Trained Retina Specialists

Andrzej Grzybowski; Piotr Brona; Tomasz Krzywicki; Magdalena Gaca-Wysocka; Arleta Berlińska; Anna Święch

doi:10.3390/jcm11113125

Variability of Grading DR Screening Images among Non-Trained Retina Specialists

J Clin Med. 2022 May 31;11(11):3125. doi: 10.3390/jcm11113125.

Authors

Andrzej Grzybowski^{1

2}, Piotr Brona³, Tomasz Krzywicki⁴, Magdalena Gaca-Wysocka³, Arleta Berlińska¹, Anna Święch⁵

Affiliations

¹ Department of Ophthalmology, University of Warmia and Mazury, 10-719 Olsztyn, Poland.
² Institute for Research in Ophthalmology, Foundation for Ophthalmology Development, 60-553 Poznan, Poland.
³ Department of Ophthalmology, Poznan City Hospital, Szwajcarska 3, 60-285 Poznan, Poland.
⁴ Department of Mathematical Methods of Informatics, University of Warmia and Mazury, 10-719 Olsztyn, Poland.
⁵ Department of Vitreoretinal Surgery, Medical University of Lublin, 20-093 Lublin, Poland.

Abstract

Poland has never had a widespread diabetic retinopathy (DR) screening program and subsequently has no purpose-trained graders and no established grader training scheme. Herein, we compare the performance and variability of three retinal specialists with no additional DR grading training in assessing images from 335 real-life screening encounters and contrast their performance against IDx-DR, a US Food and Drug Administration (FDA) approved DR screening suite. A total of 1501 fundus images from 670 eyes were assessed by each grader with a final grade on a per-eye level. Unanimous agreement between all graders was achieved for 385 eyes, and 110 patients, out of which 98% had a final grade of no DR. Thirty-six patients had final grades higher than mild DR, out of which only two had no grader disagreements regarding severity. A total of 28 eyes underwent adjudication due to complete grader disagreement. Four patients had discordant grades ranging from no DR to severe DR between the human graders and IDx-DR. Retina specialists achieved kappa scores of 0.52, 0.78, and 0.61. Retina specialists had relatively high grader variability and only a modest concordance with IDx-DR results. Focused training and verification are recommended for any potential DR graders before assessing DR screening images.

Keywords: deep learning; diabetic retinopathy grading; diabetic retinopathy screening; grader comparison; inter-grader variability.

Grants and funding

This research received no external funding.