CL-SPO2Net: Contrastive Learning Spatiotemporal Attention Network for Non-Contact Video-Based SpO2 Estimation

Bioengineering (Basel). 2024 Jan 24;11(2):113. doi: 10.3390/bioengineering11020113.

Abstract

Video-based peripheral oxygen saturation (SpO2) estimation, utilizing solely RGB cameras, offers a non-contact approach to measuring blood oxygen levels. Previous studies set a stable and unchanging environment as the premise for non-contact blood oxygen estimation. Additionally, they utilized a small amount of labeled data for system training and learning. However, it is challenging to train optimal model parameters with a small dataset. The accuracy of blood oxygen detection is easily affected by ambient light and subject movement. To address these issues, this paper proposes a contrastive learning spatiotemporal attention network (CL-SPO2Net), an innovative semi-supervised network for video-based SpO2 estimation. Spatiotemporal similarities in remote photoplethysmography (rPPG) signals were found in video segments containing facial or hand regions. Subsequently, integrating deep neural networks with machine learning expertise enabled the estimation of SpO2. The method had good feasibility in the case of small-scale labeled datasets, with the mean absolute error between the camera and the reference pulse oximeter of 0.85% in the stable environment, 1.13% with lighting fluctuations, and 1.20% in the facial rotation situation.

Keywords: attention mechanism; computer vision; contrastive learning; deep learning; peripheral oxygen saturation (SpO2); remote photoplethysmography (rPPG).