Towards population-independent, multi-disease detection in fundus photographs

Sarah Matta; Mathieu Lamard; Pierre-Henri Conze; Alexandre Le Guilcher; Clément Lecat; Romuald Carette; Fabien Basset; Pascale Massin; Jean-Bernard Rottier; Béatrice Cochener; Gwenolé Quellec

doi:10.1038/s41598-023-38610-y

Towards population-independent, multi-disease detection in fundus photographs

Sci Rep. 2023 Jul 17;13(1):11493. doi: 10.1038/s41598-023-38610-y.

Authors

Sarah Matta^{1

2}, Mathieu Lamard^{3

4}, Pierre-Henri Conze^{4

5}, Alexandre Le Guilcher⁶, Clément Lecat⁶, Romuald Carette⁶, Fabien Basset⁶, Pascale Massin⁷, Jean-Bernard Rottier⁸, Béatrice Cochener^{3

4

9}, Gwenolé Quellec⁴

Affiliations

¹ Université de Bretagne Occidentale, Brest, Bretagne, France. sarah.matta@univ-brest.fr.
² INSERM, UMR 1101, Brest, F-29 200, France. sarah.matta@univ-brest.fr.
³ Université de Bretagne Occidentale, Brest, Bretagne, France.
⁴ INSERM, UMR 1101, Brest, F-29 200, France.
⁵ IMT Atlantique, Brest, F-29200, France.
⁶ Evolucare Technologies, Villers-Bretonneux, F-80800, France.
⁷ Service d'Ophtalmologie, Hôpital Lariboisière, APHP, Paris, F-75475, France.
⁸ Bâtiment de consultation porte 14 Pôle Santé Sud CMCM, 28 Rue de Guetteloup, Le Mans, F-72100, France.
⁹ Service d'Ophtalmologie, CHRU Brest, Brest, F-29200, France.

Abstract

Independent validation studies of automatic diabetic retinopathy screening systems have recently shown a drop of screening performance on external data. Beyond diabetic retinopathy, this study investigates the generalizability of deep learning (DL) algorithms for screening various ocular anomalies in fundus photographs, across heterogeneous populations and imaging protocols. The following datasets are considered: OPHDIAT (France, diabetic population), OphtaMaine (France, general population), RIADD (India, general population) and ODIR (China, general population). Two multi-disease DL algorithms were developed: a Single-Dataset (SD) network, trained on the largest dataset (OPHDIAT), and a Multiple-Dataset (MD) network, trained on multiple datasets simultaneously. To assess their generalizability, both algorithms were evaluated whenever training and test data originate from overlapping datasets or from disjoint datasets. The SD network achieved a mean per-disease area under the receiver operating characteristic curve (mAUC) of 0.9571 on OPHDIAT. However, it generalized poorly to the other three datasets (mAUC < 0.9). When all four datasets were involved in training, the MD network significantly outperformed the SD network (p = 0.0058), indicating improved generality. However, in leave-one-dataset-out experiments, performance of the MD network was significantly lower on populations unseen during training than on populations involved in training (p < 0.0001), indicating imperfect generalizability.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Diabetic Retinopathy* / diagnostic imaging
Diagnostic Techniques, Ophthalmological
Eye Diseases* / diagnosis
Fundus Oculi
Humans
ROC Curve