Toward a Vision-Based Intelligent System: A Stacked Encoded Deep Learning Framework for Sign Language Recognition

Sensors (Basel). 2023 Nov 9;23(22):9068. doi: 10.3390/s23229068.

Abstract

Sign language recognition, an essential interface between the hearing and deaf-mute communities, faces challenges with high false positive rates and computational costs, even with the use of advanced deep learning techniques. Our proposed solution is a stacked encoded model, combining artificial intelligence (AI) with the Internet of Things (IoT), which refines feature extraction and classification to overcome these challenges. We leverage a lightweight backbone model for preliminary feature extraction and use stacked autoencoders to further refine these features. Our approach harnesses the scalability of big data, showing notable improvement in accuracy, precision, recall, F1-score, and complexity analysis. Our model's effectiveness is demonstrated through testing on the ArSL2018 benchmark dataset, showcasing superior performance compared to state-of-the-art approaches. Additional validation through an ablation study with pre-trained convolutional neural network (CNN) models affirms our model's efficacy across all evaluation metrics. Our work paves the way for the sustainable development of high-performing, IoT-based sign-language-recognition applications.

Keywords: Arabic sign language recognition; computer vision; convolution neural network; deep learning; image processing; machine learning.

MeSH terms

  • Artificial Intelligence*
  • Deep Learning*
  • Humans
  • Machine Learning
  • Neural Networks, Computer
  • Sign Language

Grants and funding

This research received no external funding.