Image Captioning for the Visually Impaired and Blind: A Recipe for Low-Resource Languages

Batyr Arystanbekov; Askat Kuzdeuov; Shakhizat Nurgaliyev; Huseyin Atakan Varol

doi:10.1109/EMBC40787.2023.10340575

Image Captioning for the Visually Impaired and Blind: A Recipe for Low-Resource Languages

Annu Int Conf IEEE Eng Med Biol Soc. 2023 Jul:2023:1-4. doi: 10.1109/EMBC40787.2023.10340575.

Authors

Batyr Arystanbekov, Askat Kuzdeuov, Shakhizat Nurgaliyev, Huseyin Atakan Varol

PMID: 38083226
DOI: 10.1109/EMBC40787.2023.10340575

Abstract

Visually impaired and blind people often face a range of socioeconomic problems that can make it difficult for them to live independently and participate fully in society. Advances in machine learning pave new venues to implement assistive devices for the visually impaired and blind. In this work, we combined image captioning and text-to-speech technologies to create an assistive device for the visually impaired and blind. Our system can provide the user with descriptive auditory feedback in the Kazakh language on a scene acquired in real-time by a head-mounted camera. The image captioning model for the Kazakh language provided satisfactory results in both quantitative metrics and subjective evaluation. Finally, experiments with a visually unimpaired blindfolded participant demonstrated the feasibility of our approach.

MeSH terms

Blindness
Humans
Language
Machine Learning
Self-Help Devices*
Visually Impaired Persons*