Usefulness of Machine Learning for Identification of Referable Diabetic Retinopathy in a Large-Scale Population-Based Study

Cheng Yang; Qingyang Liu; Haike Guo; Min Zhang; Lixin Zhang; Guanrong Zhang; Jin Zeng; Zhongning Huang; Qianli Meng; Ying Cui

doi:10.3389/fmed.2021.773881

Usefulness of Machine Learning for Identification of Referable Diabetic Retinopathy in a Large-Scale Population-Based Study

Front Med (Lausanne). 2021 Dec 9:8:773881. doi: 10.3389/fmed.2021.773881. eCollection 2021.

Authors

Cheng Yang¹, Qingyang Liu², Haike Guo^{3

4}, Min Zhang², Lixin Zhang⁵, Guanrong Zhang⁶, Jin Zeng¹, Zhongning Huang¹, Qianli Meng¹, Ying Cui¹

Affiliations

¹ Department of Ophthalmology, Guangdong Provincial People's Hospital, Guangdong Eye Institute, Guangdong Academy of Medical Sciences, Guangzhou, China.
² Department of Ophthalmology, Dongguan People's Hospital, Dongguan, China.
³ Shanghai Peace Eye Hospital, Shanghai, China.
⁴ Xiamen Eye Center, Xiamen University, Xiamen, China.
⁵ Department of Ophthalmology, Hengli Hospital, Dongguan, China.
⁶ Information and Statistical Center, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China.

Abstract

Purpose: To development and validation of machine learning-based classifiers based on simple non-ocular metrics for detecting referable diabetic retinopathy (RDR) in a large-scale Chinese population-based survey. Methods: The 1,418 patients with diabetes mellitus from 8,952 rural residents screened in the population-based Dongguan Eye Study were used for model development and validation. Eight algorithms [extreme gradient boosting (XGBoost), random forest, naïve Bayes, k-nearest neighbor (KNN), AdaBoost, Light GBM, artificial neural network (ANN), and logistic regression] were used for modeling to detect RDR in individuals with diabetes. The area under the receiver operating characteristic curve (AUC) and their 95% confidential interval (95% CI) were estimated using five-fold cross-validation as well as an 80:20 ratio of training and validation. Results: The 10 most important features in machine learning models were duration of diabetes, HbA1c, systolic blood pressure, triglyceride, body mass index, serum creatine, age, educational level, duration of hypertension, and income level. Based on these top 10 variables, the XGBoost model achieved the best discriminative performance, with an AUC of 0.816 (95%CI: 0.812, 0.820). The AUCs for logistic regression, AdaBoost, naïve Bayes, and Random forest were 0.766 (95%CI: 0.756, 0.776), 0.754 (95%CI: 0.744, 0.764), 0.753 (95%CI: 0.743, 0.763), and 0.705 (95%CI: 0.697, 0.713), respectively. Conclusions: A machine learning-based classifier that used 10 easily obtained non-ocular variables was able to effectively detect RDR patients. The importance scores of the variables provide insight to prevent the occurrence of RDR. Screening RDR with machine learning provides a useful complementary tool for clinical practice in resource-poor areas with limited ophthalmic infrastructure.

Keywords: XGBoost; classifier; diabetic retinopathy; machine learning; population-based study.