Toward a Vision-Based Intelligent System: A Stacked Encoded Deep Learning Framework for Sign Language Recognition

Muhammad Islam; Mohammed Aloraini; Suliman Aladhadh; Shabana Habib; Asma Khan; Abduatif Alabdulatif; Turki M Alanazi

doi:10.3390/s23229068

Toward a Vision-Based Intelligent System: A Stacked Encoded Deep Learning Framework for Sign Language Recognition

Sensors (Basel). 2023 Nov 9;23(22):9068. doi: 10.3390/s23229068.

Authors

Muhammad Islam¹, Mohammed Aloraini¹, Suliman Aladhadh², Shabana Habib², Asma Khan³, Abduatif Alabdulatif⁴, Turki M Alanazi⁵

Affiliations

¹ Department of Electrical Engineering, College of Engineering, Qassim University, Unaizah 56452, Saudi Arabia.
² Department of Information Technology, College of Computer, Qassim University, Buraydah 51452, Saudi Arabia.
³ Department of Computer Science, Islamia College, Peshawar 25120, Pakistan.
⁴ Department of Computer Science, College of Computer, Qassim University, Buraydah 51452, Saudi Arabia.
⁵ Department of Electrical Engineering, College of Engineering, Jouf University, Sakaka 72388, Saudi Arabia.

Abstract

Sign language recognition, an essential interface between the hearing and deaf-mute communities, faces challenges with high false positive rates and computational costs, even with the use of advanced deep learning techniques. Our proposed solution is a stacked encoded model, combining artificial intelligence (AI) with the Internet of Things (IoT), which refines feature extraction and classification to overcome these challenges. We leverage a lightweight backbone model for preliminary feature extraction and use stacked autoencoders to further refine these features. Our approach harnesses the scalability of big data, showing notable improvement in accuracy, precision, recall, F1-score, and complexity analysis. Our model's effectiveness is demonstrated through testing on the ArSL2018 benchmark dataset, showcasing superior performance compared to state-of-the-art approaches. Additional validation through an ablation study with pre-trained convolutional neural network (CNN) models affirms our model's efficacy across all evaluation metrics. Our work paves the way for the sustainable development of high-performing, IoT-based sign-language-recognition applications.

Keywords: Arabic sign language recognition; computer vision; convolution neural network; deep learning; image processing; machine learning.

MeSH terms

Artificial Intelligence*
Deep Learning*
Humans
Machine Learning
Neural Networks, Computer
Sign Language

Grants and funding

This research received no external funding.