Transferable non-invasive modal fusion-transformer (NIMFT) for end-to-end hand gesture recognition

Tianxiang Xu; Kunkun Zhao; Yuxiang Hu; Liang Li; Wei Wang; Fulin Wang; Yuxuan Zhou; Jianqing Li

doi:10.1088/1741-2552/ad39a5

Transferable non-invasive modal fusion-transformer (NIMFT) for end-to-end hand gesture recognition

J Neural Eng. 2024 Apr 9;21(2). doi: 10.1088/1741-2552/ad39a5.

Authors

Tianxiang Xu^{1

2}, Kunkun Zhao^{1

2}, Yuxiang Hu^{1

2}, Liang Li^{1

2}, Wei Wang^{1

2}, Fulin Wang^{1

3}, Yuxuan Zhou^{1

2}, Jianqing Li^{1

2}

Affiliations

¹ School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, People's Republic of China.
² The Engineering Research Center of Intelligent Theranostics Technology and Instruments, Ministry of Education, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, People's Republic of China.
³ Nanjing PANDA Electronics Equipment Co., Ltd, Nanjing 210033, People's Republic of China.

PMID: 38565124
DOI: 10.1088/1741-2552/ad39a5

Abstract

Objective.Recent studies have shown that integrating inertial measurement unit (IMU) signals with surface electromyographic (sEMG) can greatly improve hand gesture recognition (HGR) performance in applications such as prosthetic control and rehabilitation training. However, current deep learning models for multimodal HGR encounter difficulties in invasive modal fusion, complex feature extraction from heterogeneous signals, and limited inter-subject model generalization. To address these challenges, this study aims to develop an end-to-end and inter-subject transferable model that utilizes non-invasively fused sEMG and acceleration (ACC) data.Approach.The proposed non-invasive modal fusion-transformer (NIMFT) model utilizes 1D-convolutional neural networks-based patch embedding for local information extraction and employs a multi-head cross-attention (MCA) mechanism to non-invasively integrate sEMG and ACC signals, stabilizing the variability induced by sEMG. The proposed architecture undergoes detailed ablation studies after hyperparameter tuning. Transfer learning is employed by fine-tuning a pre-trained model on new subject and a comparative analysis is performed between the fine-tuning and subject-specific model. Additionally, the performance of NIMFT is compared to state-of-the-art fusion models.Main results.The NIMFT model achieved recognition accuracies of 93.91%, 91.02%, and 95.56% on the three action sets in the Ninapro DB2 dataset. The proposed embedding method and MCA outperformed the traditional invasive modal fusion transformer by 2.01% (embedding) and 1.23% (fusion), respectively. In comparison to subject-specific models, the fine-tuning model exhibited the highest average accuracy improvement of 2.26%, achieving a final accuracy of 96.13%. Moreover, the NIMFT model demonstrated superiority in terms of accuracy, recall, precision, and F1-score compared to the latest modal fusion models with similar model scale.Significance.The NIMFT is a novel end-to-end HGR model, utilizes a non-invasive MCA mechanism to integrate long-range intermodal information effectively. Compared to recent modal fusion models, it demonstrates superior performance in inter-subject experiments and offers higher training efficiency and accuracy levels through transfer learning than subject-specific approaches.

Keywords: acceleration; deep transfer learning; hand gesture recognition; multimodal fusion; sEMG; transformer.

MeSH terms

Electric Power Supplies
Electromyography
Gestures*
Mental Recall
Neural Networks, Computer
Recognition, Psychology*