MIL-CT: Multiple Instance Learning via a Cross-Scale Transformer for Enhanced Arterial Light Reflex Detection

Bioengineering (Basel). 2023 Aug 16;10(8):971. doi: 10.3390/bioengineering10080971.

Abstract

One of the early manifestations of systemic atherosclerosis, which leads to blood circulation issues, is the enhanced arterial light reflex (EALR). Fundus images are commonly used for regular screening purposes to intervene and assess the severity of systemic atherosclerosis in a timely manner. However, there is a lack of automated methods that can meet the demands of large-scale population screening. Therefore, this study introduces a novel cross-scale transformer-based multi-instance learning method, named MIL-CT, for the detection of early arterial lesions (e.g., EALR) in fundus images. MIL-CT utilizes the cross-scale vision transformer to extract retinal features in a multi-granularity perceptual domain. It incorporates a multi-head cross-scale attention fusion module to enhance global perceptual capability and feature representation. By integrating information from different scales and minimizing information loss, the method significantly improves the performance of the EALR detection task. Furthermore, a multi-instance learning module is implemented to enable the model to better comprehend local details and features in fundus images, facilitating the classification of patch tokens related to retinal lesions. To effectively learn the features associated with retinal lesions, we utilize weights pre-trained on a large fundus image Kaggle dataset. Our validation and comparison experiments conducted on our collected EALR dataset demonstrate the effectiveness of the MIL-CT method in reducing generalization errors while maintaining efficient attention to retinal vascular details. Moreover, the method surpasses existing models in EALR detection, achieving an accuracy, precision, sensitivity, specificity, and F1 score of 97.62%, 97.63%, 97.05%, 96.48%, and 97.62%, respectively. These results exhibit the significant enhancement in diagnostic accuracy of fundus images brought about by the MIL-CT method. Thus, it holds potential for various applications, particularly in the early screening of cardiovascular diseases such as hypertension and atherosclerosis.

Keywords: arteriosclerosis; cross-scale transformer; deep learning; fundus image; multiple instance learning.