Synthetic Corpus Generation for Deep Learning-Based Translation of Spanish Sign Language

Sensors (Basel). 2024 Feb 24;24(5):1472. doi: 10.3390/s24051472.

Abstract

Sign language serves as the primary mode of communication for the deaf community. With technological advancements, it is crucial to develop systems capable of enhancing communication between deaf and hearing individuals. This paper reviews recent state-of-the-art methods in sign language recognition, translation, and production. Additionally, we introduce a rule-based system, called ruLSE, for generating synthetic datasets in Spanish Sign Language. To check the usefulness of these datasets, we conduct experiments with two state-of-the-art models based on Transformers, MarianMT and Transformer-STMC. In general, we observe that the former achieves better results (+3.7 points in the BLEU-4 metric) although the latter is up to four times faster. Furthermore, the use of pre-trained word embeddings in Spanish enhances results. The rule-based system demonstrates superior performance and efficiency compared to Transformer models in Sign Language Production tasks. Lastly, we contribute to the state of the art by releasing the generated synthetic dataset in Spanish named synLSE.

Keywords: gloss; neural machine translation; sign language; sign language production; sign language translation; synthetic corpus.

Publication types

  • Review

MeSH terms

  • Communication
  • Deep Learning*
  • Hearing
  • Humans
  • Sign Language

Grants and funding