Deep Learning Technology to Recognize American Sign Language Alphabet

Bader Alsharif; Ali Salem Altaher; Ahmed Altaher; Mohammad Ilyas; Easa Alalwany

doi:10.3390/s23187970

Deep Learning Technology to Recognize American Sign Language Alphabet

Sensors (Basel). 2023 Sep 19;23(18):7970. doi: 10.3390/s23187970.

Authors

Bader Alsharif^{1

2}, Ali Salem Altaher¹, Ahmed Altaher^{1

3}, Mohammad Ilyas¹, Easa Alalwany^{1

4}

Affiliations

¹ Department of Electrical Engineering and Computer Science, Florida Atlantic University, 777 Glades Road, Boca Raton, FL 33431, USA.
² Department of Computer Science and Engineering, College of Telecommunication and Information, Technical and Vocational Training Corporation (TVTC), Riyadh 11564, Saudi Arabia.
³ Electronic Computer Center, Al-Nahrain University, Jadriya, Baghdad 64074, Iraq.
⁴ College of Computer Science and Engineering, Taibah University, Yanbu 46421, Saudi Arabia.

Abstract

Historically, individuals with hearing impairments have faced neglect, lacking the necessary tools to facilitate effective communication. However, advancements in modern technology have paved the way for the development of various tools and software aimed at improving the quality of life for hearing-disabled individuals. This research paper presents a comprehensive study employing five distinct deep learning models to recognize hand gestures for the American Sign Language (ASL) alphabet. The primary objective of this study was to leverage contemporary technology to bridge the communication gap between hearing-impaired individuals and individuals with no hearing impairment. The models utilized in this research include AlexNet, ConvNeXt, EfficientNet, ResNet-50, and VisionTransformer were trained and tested using an extensive dataset comprising over 87,000 images of the ASL alphabet hand gestures. Numerous experiments were conducted, involving modifications to the architectural design parameters of the models to obtain maximum recognition accuracy. The experimental results of our study revealed that ResNet-50 achieved an exceptional accuracy rate of 99.98%, the highest among all models. EfficientNet attained an accuracy rate of 99.95%, ConvNeXt achieved 99.51% accuracy, AlexNet attained 99.50% accuracy, while VisionTransformer yielded the lowest accuracy of 88.59%.

Keywords: AlexNet; American sign language; ConvNeXt; EfficientNet; ResNet-50; VisionTransformer; deep learning; image-based; transfer learning.

MeSH terms

Deep Learning*
Gestures
Humans
Quality of Life
Sign Language*
Technology
United States

Grants and funding

This research received no external funding.