Artificial intelligence-based screening for amblyopia and its risk factors: comparison with four classic stereovision tests

Front Med (Lausanne). 2023 Dec 22:10:1294559. doi: 10.3389/fmed.2023.1294559. eCollection 2023.

Abstract

Introduction: The development of costs-effective and sensitive screening solutions to prevent amblyopia and identify its risk factors (strabismus, refractive problems or mixed) is a significant priority of pediatric ophthalmology. The main objective of our study was to compare the classification performance of various vision screening tests, including classic, stereoacuity-based tests (Lang II, TNO, Stereo Fly, and Frisby), and non-stereoacuity-based, low-density static, dynamic, and noisy anaglyphic random dot stereograms. We determined whether the combination of non-stereoacuity-based tests integrated in the simplest artificial intelligence (AI) model could be an alternative method for vision screening.

Methods: Our study, conducted in Spain and Hungary, is a non-experimental, cross-sectional diagnostic test assessment focused on pediatric eye conditions. Using convenience sampling, we enrolled 423 children aged 3.6-14 years, diagnosed with amblyopia, strabismus, or refractive errors, and compared them to age-matched emmetropic controls. Comprehensive pediatric ophthalmologic examinations ascertained diagnoses. Participants used filter glasses for stereovision tests and red-green goggles for an AI-based test over their prescribed glasses. Sensitivity, specificity, and the area under the ROC curve (AUC) were our metrics, with sensitivity being the primary endpoint. AUCs were analyzed using DeLong's method, and binary classifications (pathologic vs. normal) were evaluated using McNemar's matched pair and Fisher's nonparametric tests.

Results: Four non-overlapping groups were studied: (1) amblyopia (n = 46), (2) amblyogenic (n = 55), (3) non-amblyogenic (n = 128), and (4) emmetropic (n = 194), and a fifth group that was a combination of the amblyopia and amblyogenic groups. Based on AUCs, the AI combination of non-stereoacuity-based tests showed significantly better performance 0.908, 95% CI: (0.829-0.958) for detecting amblyopia and its risk factors than most classical tests: Lang II: 0.704, (0.648-0.755), Stereo Fly: 0.780, (0.714-0.837), Frisby: 0.754 (0.688-0.812), p < 0.02, n = 91, DeLong's method). At the optimum ROC point, McNemar's test indicated significantly higher sensitivity in accord with AUCs. Moreover, the AI solution had significantly higher sensitivity than TNO (p = 0.046, N = 134, Fisher's test), as well, while the specificity did not differ.

Discussion: The combination of multiple tests utilizing anaglyphic random dot stereograms with varying parameters (density, noise, dynamism) in AI leads to the most advanced and sensitive screening test for identifying amblyopia and amblyogenic conditions compared to all the other tests studied.

Keywords: ROC (receiver operating characteristic) analysis; amblyogenic conditions; amblyopia; amblyopia risk factors; artificial intelligence – AI; cost-effective; screening; strabism.

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. Hungarian Brain Research Program 2 (2017–1.2.1.-NKP2017) (GJ, PB). Thematic Excellence Program 2021 Health Sub-programme of the Ministry for Innovation and Technology in Hungary, within the framework of the EGA-16 project of the University of Pécs (TKP2021-EGA-16) (GJ, PB). OTKA K108747 (PB). New National Excellence Program of the Ministry for Innovation and Technology (ÚNKP-19-3) (ZC). Ministry of Economy, Industry and Competitiveness of Spain within the program Ramón y Cajal, RYC-2016-20471 (DP). The funding organizations had no role in the design or conduct of this research.