Multimodal image encoding pre-training for diabetic retinopathy grading

Álvaro S Hervella; José Rouco; Jorge Novo; Marcos Ortega

doi:10.1016/j.compbiomed.2022.105302

Multimodal image encoding pre-training for diabetic retinopathy grading

Comput Biol Med. 2022 Apr:143:105302. doi: 10.1016/j.compbiomed.2022.105302. Epub 2022 Feb 16.

Authors

Álvaro S Hervella¹, José Rouco², Jorge Novo³, Marcos Ortega⁴

Affiliations

¹ Centro de Investigación CITIC, Universidade da Coruña, A Coruña, Spain; VARPA Research Group, Instituto de Investigación Biomédica de A Coruña (INIBIC), Universidade da Coruña, A Coruña, Spain. Electronic address: a.suarezh@udc.es.
² Centro de Investigación CITIC, Universidade da Coruña, A Coruña, Spain; VARPA Research Group, Instituto de Investigación Biomédica de A Coruña (INIBIC), Universidade da Coruña, A Coruña, Spain. Electronic address: jrouco@udc.es.
³ Centro de Investigación CITIC, Universidade da Coruña, A Coruña, Spain; VARPA Research Group, Instituto de Investigación Biomédica de A Coruña (INIBIC), Universidade da Coruña, A Coruña, Spain. Electronic address: jnovo@udc.es.
⁴ Centro de Investigación CITIC, Universidade da Coruña, A Coruña, Spain; VARPA Research Group, Instituto de Investigación Biomédica de A Coruña (INIBIC), Universidade da Coruña, A Coruña, Spain. Electronic address: mortega@udc.es.

PMID: 35219187
DOI: 10.1016/j.compbiomed.2022.105302

Abstract

Diabetic retinopathy is an increasingly prevalent eye disorder that can lead to severe vision impairment. The severity grading of the disease using retinal images is key to provide an adequate treatment. However, in order to learn the diverse patterns and complex relations that are required for the grading, deep neural networks require very large annotated datasets that are not always available. This has been typically addressed by reusing networks that were pre-trained for natural image classification, hence relying on additional annotated data from a different domain. In contrast, we propose a novel pre-training approach that takes advantage of unlabeled multimodal visual data commonly available in ophthalmology. The use of multimodal visual data for pre-training purposes has been previously explored by training a network in the prediction of one image modality from another. However, that approach does not ensure a broad understanding of the retinal images, given that the network may exclusively focus on the similarities between modalities while ignoring the differences. Thus, we propose a novel self-supervised pre-training that explicitly teaches the networks to learn the common characteristics between modalities as well as the characteristics that are exclusive to the input modality. This provides a complete comprehension of the input domain and facilitates the training of downstream tasks that require a broad understanding of the retinal images, such as the grading of diabetic retinopathy. To validate and analyze the proposed approach, we performed an exhaustive experimentation on different public datasets. The transfer learning performance for the grading of diabetic retinopathy is evaluated under different settings while also comparing against previous state-of-the-art pre-training approaches. Additionally, a comparison against relevant state-of-the-art works for the detection and grading of diabetic retinopathy is also provided. The results show a satisfactory performance of the proposed approach, which outperforms previous pre-training alternatives in the grading of diabetic retinopathy.

Keywords: Computer-aided diagnosis; Deep learning; Diabetic retinopathy; Eye fundus; Medical imaging; Self-supervised learning.