Automatic ECG classification and label quality in training data

Ľubomír Antoni; Erik Bruoth; Peter Bugata; Peter Bugata Jr; Dávid Gajdoš; Šimon Horvát; Dávid Hudák; Vladimíra Kmečová; Richard Staňa; Monika Staňková; Alexander Szabari; Gabriela Vozáriková

doi:10.1088/1361-6579/ac69a8

Automatic ECG classification and label quality in training data

Physiol Meas. 2022 Jun 28;43(6). doi: 10.1088/1361-6579/ac69a8.

Affiliations

¹ Institute of Computer Science, Faculty of Science, Pavol Jozef Šafárik University in Košice, Jesenná 5, 040 01, Košice, Slovakia.
² Data Science Laboratory, VSL Software, a.s., Lomená 8, 040 01, Košice, Slovakia.

PMID: 35453131
DOI: 10.1088/1361-6579/ac69a8

Abstract

Objective.Within the PhysioNet/Computing in Cardiology Challenge 2021, we focused on the design of a machine learning algorithm to identify cardiac abnormalities from electrocardiogram recordings (ECGs) with a various number of leads and to assess the diagnostic potential of reduced-lead ECGs compared to standard 12-lead ECGs.Approach.In our solution, we developed a model based on a deep convolutional neural network, which is a 1D variant of the popular ResNet50 network. This base model was pre-trained on a large training set with our proposed mapping of original labels to SNOMED codes, using three-valued labels. In the next phase, the model was fine-tuned for the Challenge metric and conditions.Main results.In the Challenge, our proposed approach (team CeZIS) achieved a Challenge test score of 0.52 for all lead configurations, placing us 5th out of 39 in the official ranking. Our improved post-Challenge solution was evaluated as the best for all ranked configurations, i.e. for 12-lead, 3-lead, and 2-lead versions of the full test set with the Challenge test score of 0.62, 0.61, and 0.59, respectively.Significance.In addition to building the model for identifying cardiac anomalies, we provide a more detailed description of the issues associated with label mapping and propose its modification in order to obtain a better starting point for training more powerful classification models. We compare the performance of models for different numbers of leads and identify labels for which two leads are sufficient. Moreover, we evaluate the label quality in individual parts of the Challenge training set.

Keywords: ECG signal; deep neural network; multi-label classification.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Electrocardiography* / methods
Machine Learning
Neural Networks, Computer*