HGR-ViT: Hand Gesture Recognition with Vision Transformer

Chun Keat Tan; Kian Ming Lim; Roy Kwang Yang Chang; Chin Poo Lee; Ali Alqahtani

doi:10.3390/s23125555

HGR-ViT: Hand Gesture Recognition with Vision Transformer

Sensors (Basel). 2023 Jun 14;23(12):5555. doi: 10.3390/s23125555.

Authors

Chun Keat Tan¹, Kian Ming Lim¹, Roy Kwang Yang Chang¹, Chin Poo Lee¹, Ali Alqahtani^{2

3}

Affiliations

¹ Faculty of Information Science and Technology, Multimedia University, Jalan Ayer Keroh Lama, Melaka 75450, Malaysia.
² Department of Computer Science, King Khalid University, Abha 61421, Saudi Arabia.
³ Center for Artificial Intelligence (CAI), King Khalid University, Abha 61421, Saudi Arabia.

Abstract

Hand gesture recognition (HGR) is a crucial area of research that enhances communication by overcoming language barriers and facilitating human-computer interaction. Although previous works in HGR have employed deep neural networks, they fail to encode the orientation and position of the hand in the image. To address this issue, this paper proposes HGR-ViT, a Vision Transformer (ViT) model with an attention mechanism for hand gesture recognition. Given a hand gesture image, it is first split into fixed size patches. Positional embedding is added to these embeddings to form learnable vectors that capture the positional information of the hand patches. The resulting sequence of vectors are then served as the input to a standard Transformer encoder to obtain the hand gesture representation. A multilayer perceptron head is added to the output of the encoder to classify the hand gesture to the correct class. The proposed HGR-ViT obtains an accuracy of 99.98%, 99.36% and 99.85% for the American Sign Language (ASL) dataset, ASL with Digits dataset, and National University of Singapore (NUS) hand gesture dataset, respectively.

Keywords: ViT; attention; hand gesture recognition; sign language recognition; vision transformer.

MeSH terms

Gestures*
Hand
Humans
Neural Networks, Computer
Pattern Recognition, Automated* / methods
Sign Language
Upper Extremity

Abstract

MeSH terms

Grants and funding