On Merging Feature Engineering and Deep Learning for Diagnosis, Risk Prediction and Age Estimation Based on the 12-Lead ECG

Eran Zvuloni; Jesse Read; Antonio H Ribeiro; Antonio Luiz P Ribeiro; Joachim A Behar

doi:10.1109/TBME.2023.3239527

On Merging Feature Engineering and Deep Learning for Diagnosis, Risk Prediction and Age Estimation Based on the 12-Lead ECG

IEEE Trans Biomed Eng. 2023 Jul;70(7):2227-2236. doi: 10.1109/TBME.2023.3239527. Epub 2023 Jun 19.

Authors

Eran Zvuloni, Jesse Read, Antonio H Ribeiro, Antonio Luiz P Ribeiro, Joachim A Behar

PMID: 37022038
DOI: 10.1109/TBME.2023.3239527

Abstract

Objective: Over the past few years, deep learning (DL) has been used extensively in research for 12-lead electrocardiogram (ECG) analysis. However, it is unclear whether the explicit or implicit claims made on DL superiority to the more classical feature engineering (FE) approaches, based on domain knowledge, hold. In addition, it remains unclear whether combining DL with FE may improve performance over a single modality.

Methods: To address these research gaps and in-line with recent major experiments, we revisited three tasks: cardiac arrhythmia diagnosis (multiclass-multilabel classification), atrial fibrillation risk prediction (binary classification), and age estimation (regression). We used an overall dataset of 2.3M 12-lead ECG recordings to train the following models for each task: i) a random forest taking FE as input; ii) an end-to-end DL model; and iii) a merged model of FE+DL.

Results: FE yielded comparable results to DL while necessitating significantly less data for the two classification tasks. DL outperformed FE for the regression task. For all tasks, merging FE with DL did not improve performance over DL alone. These findings were confirmed on the additional PTB-XL dataset.

Conclusion: We found that for traditional 12-lead ECG based diagnosis tasks, DL did not yield a meaningful improvement over FE, while it improved significantly the nontraditional regression task. We also found that combining FE with DL did not improve over DL alone, which suggests that the FE was redundant with the features learned by DL.

Significance: Our findings provides important recommendations on 12-lead ECG based machine learning strategy and data regime to choose for a given task. When looking at maximizing performance as the end goal, if the task is nontraditional and a large dataset is available then DL is preferable. If the task is a classical one and/or a small dataset is available then a FE approach may be the better choice.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Atrial Fibrillation*
Deep Learning*
Electrocardiography / methods
Humans
Machine Learning