An Arrhythmia Classification Model Based on Vision Transformer with Deformable Attention

Yanfang Dong; Miao Zhang; Lishen Qiu; Lirong Wang; Yong Yu

doi:10.3390/mi14061155

An Arrhythmia Classification Model Based on Vision Transformer with Deformable Attention

Micromachines (Basel). 2023 May 30;14(6):1155. doi: 10.3390/mi14061155.

Authors

Yanfang Dong^{1

2}, Miao Zhang², Lishen Qiu¹, Lirong Wang^{2

3}, Yong Yu²

Affiliations

¹ School of Biomedical Engineering, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei 230026, China.
² Suzhou Institute of Biomedical Engineering and Technology, China Academy of Sciences, Suzhou 215163, China.
³ School of Electronics and Information Technology, Soochow University, Suzhou 215031, China.

Abstract

The electrocardiogram (ECG) is a highly effective non-invasive tool for monitoring heart activity and diagnosing cardiovascular diseases (CVDs). Automatic detection of arrhythmia based on ECG plays a critical role in the early prevention and diagnosis of CVDs. In recent years, numerous studies have focused on using deep learning methods to address arrhythmia classification problems. However, the transformer-based neural network in current research still has a limited performance in detecting arrhythmias for the multi-lead ECG. In this study, we propose an end-to-end multi-label arrhythmia classification model for the 12-lead ECG with varied-length recordings. Our model, called CNN-DVIT, is based on a combination of convolutional neural networks (CNNs) with depthwise separable convolution, and a vision transformer structure with deformable attention. Specifically, we introduce the spatial pyramid pooling layer to accept varied-length ECG signals. Experimental results show that our model achieved an F1 score of 82.9% in CPSC-2018. Notably, our CNN-DVIT outperforms the latest transformer-based ECG classification algorithms. Furthermore, ablation experiments reveal that the deformable multi-head attention and depthwise separable convolution are both efficient in extracting features from multi-lead ECG signals for diagnosis. The CNN-DVIT achieved good performance for the automatic arrhythmia detection of ECG signals. This indicates that our research can assist doctors in clinical ECG analysis, providing important support for the diagnosis of arrhythmia and contributing to the development of computer-aided diagnosis technology.

Keywords: ECG signal; arrhythmia; deep learning; deformable attention transformer; depthwise separable convolution.

Grants and funding

2021YFC2501500/National Key Research and Development Program of China