Empowering Deaf-Hearing Communication: Exploring Synergies between Predictive and Generative AI-Based Strategies towards (Portuguese) Sign Language Interpretation

Telmo Adão; João Oliveira; Somayeh Shahrabadi; Hugo Jesus; Marco Fernandes; Ângelo Costa; Vânia Ferreira; Martinho Fradeira Gonçalves; Miguel A Guevara Lopéz; Emanuel Peres; Luís Gonzaga Magalhães

doi:10.3390/jimaging9110235

Empowering Deaf-Hearing Communication: Exploring Synergies between Predictive and Generative AI-Based Strategies towards (Portuguese) Sign Language Interpretation

J Imaging. 2023 Oct 25;9(11):235. doi: 10.3390/jimaging9110235.

Authors

Affiliations

¹ Department of Engineering, School of Sciences and Technology, University of Trás-os-Montes e Alto Douro, 5000-801 Vila Real, Portugal.
² ALGORITMI Research Centre/LASI, University of Minho, 4800-058 Guimarães, Portugal.
³ Centro de Computação Gráfica-CCG/zgdv, University of Minho, Campus de Azurém, Edifício 14, 4800-058 Guimarães, Portugal.
⁴ Polytechnic Institute of Bragança, School of Communication, Administration and Tourism, Campus do Cruzeiro, 5370-202 Mirandela, Portugal.
⁵ Associação Portuguesa de Surdos (APS), 1600-796 Lisboa, Portugal.
⁶ Instituto Politécnico de Setúbal, Escola Superior de Tecnologia de Setúbal, 2914-508 Setúbal, Portugal.
⁷ Centre for the Research and Technology of Agro-Environmental and Biological Sciences, University of Trás-os-Montes e Alto Douro, 5000-801 Vila Real, Portugal.
⁸ Institute for Innovation, Capacity Building and Sustainability of Agri-Food Production, University of Trás-os-Montes e Alto Douro, 5000-801 Vila Real, Portugal.

Abstract

Communication between Deaf and hearing individuals remains a persistent challenge requiring attention to foster inclusivity. Despite notable efforts in the development of digital solutions for sign language recognition (SLR), several issues persist, such as cross-platform interoperability and strategies for tokenizing signs to enable continuous conversations and coherent sentence construction. To address such issues, this paper proposes a non-invasive Portuguese Sign Language (Língua Gestual Portuguesa or LGP) interpretation system-as-a-service, leveraging skeletal posture sequence inference powered by long-short term memory (LSTM) architectures. To address the scarcity of examples during machine learning (ML) model training, dataset augmentation strategies are explored. Additionally, a buffer-based interaction technique is introduced to facilitate LGP terms tokenization. This technique provides real-time feedback to users, allowing them to gauge the time remaining to complete a sign, which aids in the construction of grammatically coherent sentences based on inferred terms/words. To support human-like conditioning rules for interpretation, a large language model (LLM) service is integrated. Experiments reveal that LSTM-based neural networks, trained with 50 LGP terms and subjected to data augmentation, achieved accuracy levels ranging from 80% to 95.6%. Users unanimously reported a high level of intuition when using the buffer-based interaction strategy for terms/words tokenization. Furthermore, tests with an LLM-specifically ChatGPT-demonstrated promising semantic correlation rates in generated sentences, comparable to expected sentences.

Keywords: Portuguese Sign Language; deaf-hearing communication; generative pre-trained transformer (GPT); inclusion; large language models (LLM); long-short term memory (LSTM); machine learning (ML); sign language recognition (SLR); video-based motion analytics.

Abstract

Grants and funding