Biometric Contrastive Learning for Data-Efficient Deep Learning from Electrocardiographic Images

Veer Sangha; Akshay Khunte; Gregory Holste; Bobak J Mortazavi; Zhangyang Wang; Evangelos K Oikonomou; Rohan Khera

doi:10.1101/2023.09.13.23295494

Biometric Contrastive Learning for Data-Efficient Deep Learning from Electrocardiographic Images

medRxiv [Preprint]. 2023 Sep 14:2023.09.13.23295494. doi: 10.1101/2023.09.13.23295494.

Authors

Veer Sangha¹, Akshay Khunte², Gregory Holste³, Bobak J Mortazavi^{4

5}, Zhangyang Wang³, Evangelos K Oikonomou⁶, Rohan Khera^{5

6

7}

Affiliations

¹ Department of Engineering Science, Oxford University, Oxford, UK.
² Department of Computer Science, Yale University, New Haven, CT, USA.
³ Department of Electrical and Computer Engineering, The University of Texas at Austin, TX, USA.
⁴ Department of Computer Science & Engineering, Texas A&M University, TX, USA.
⁵ Center for Outcomes Research and Evaluation (CORE), Yale New Haven Hospital, New Haven, CT, USA.
⁶ Section of Cardiovascular Medicine, Department of Internal Medicine, Yale University, New Haven, CT, USA.
⁷ Section of Health Informatics, Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA.

Abstract

Objective: Artificial intelligence (AI) detects heart disease from images of electrocardiograms (ECGs), however traditional supervised learning is limited by the need for large amounts of labeled data. We report the development of Biometric Contrastive Learning (BCL), a self-supervised pretraining approach for label-efficient deep learning on ECG images.

Materials and methods: Using pairs of ECGs from 78,288 individuals from Yale (2000-2015), we trained a convolutional neural network to identify temporally-separated ECG pairs that varied in layouts from the same patient. We fine-tuned BCL-pretrained models to detect atrial fibrillation (AF), gender, and LVEF<40%, using ECGs from 2015-2021. We externally tested the models in cohorts from Germany and the US. We compared BCL with random initialization and general-purpose self-supervised contrastive learning for images (simCLR).

Results: While with 100% labeled training data, BCL performed similarly to other approaches for detecting AF/Gender/LVEF<40% with AUROC of 0.98/0.90/0.90 in the held-out test sets, it consistently outperformed other methods with smaller proportions of labeled data, reaching equivalent performance at 50% of data. With 0.1% data, BCL achieved AUROC of 0.88/0.79/0.75, compared with 0.51/0.52/0.60 (random) and 0.61/0.53/0.49 (simCLR). In external validation, BCL outperformed other methods even at 100% labeled training data, with AUROC of 0.88/0.88 for Gender and LVEF<40% compared with 0.83/0.83 (random) and 0.84/0.83 (simCLR).

Discussion and conclusion: A pretraining strategy that leverages biometric signatures of different ECGs from the same patient enhances the efficiency of developing AI models for ECG images. This represents a major advance in detecting disorders from ECG images with limited labeled data.

Publication types

Preprint

Grants and funding

K23 HL153775/HL/NHLBI NIH HHS/United States