Contactless blood oxygen estimation from face videos: A multi-model fusion method based on deep learning

Min Hu; Xia Wu; Xiaohua Wang; Yan Xing; Ning An; Piao Shi

doi:10.1016/j.bspc.2022.104487

Contactless blood oxygen estimation from face videos: A multi-model fusion method based on deep learning

Biomed Signal Process Control. 2023 Mar:81:104487. doi: 10.1016/j.bspc.2022.104487. Epub 2022 Dec 10.

Authors

Min Hu¹, Xia Wu¹, Xiaohua Wang¹, Yan Xing², Ning An^{1

3}, Piao Shi¹

Affiliations

¹ Key Laboratory of Knowledge Engineering with Big Data, Ministry of Education,Anhui Province Key Laboratory of Affective Computing and Advanced Intelligent Machine, Hefei University of Technology, Hefei, Anhui 230601, China.
² School of Mathematics, Hefei University of Technology, Hefei, Anhui 230601, China.
³ National Smart Eldercare International S&T Cooperation Base, Hefei University of Technology, Hefei, Anhui 230601, China.

Abstract

Blood Oxygen ( ${SpO}_{2}$ ), a key indicator of respiratory function, has received increasing attention during the COVID-19 pandemic. Clinical results show that patients with COVID-19 likely have distinct lower ${SpO}_{2}$ before the onset of significant symptoms. Aiming at the shortcomings of current methods for monitoring ${SpO}_{2}$ by face videos, this paper proposes a novel multi-model fusion method based on deep learning for ${SpO}_{2}$ estimation. The method includes the feature extraction network named Residuals and Coordinate Attention (RCA) and the multi-model fusion ${SpO}_{2}$ estimation module. The RCA network uses the residual block cascade and coordinate attention mechanism to focus on the correlation between feature channels and the location information of feature space. The multi-model fusion module includes the Color Channel Model (CCM) and the Network-Based Model(NBM). To fully use the color feature information in face videos, an image generator is constructed in the CCM to calculate ${SpO}_{2}$ by reconstructing the red and blue channel signals. Besides, to reduce the disturbance of other physiological signals, a novel two-part loss function is designed in the NBM. Given the complementarity of the features and models that CCM and NBM focus on, a Multi-Model Fusion Model(MMFM) is constructed. The experimental results on the PURE and VIPL-HR datasets show that three models meet the clinical requirement(the mean absolute error $⩽$ 2%) and demonstrate that the multi-model fusion can fully exploit the ${SpO}_{2}$ features of face videos and improve the ${SpO}_{2}$ estimation performance. Our research achievements will facilitate applications in remote medicine and home health.

Keywords: Coordinate attention; Deep learning; Estimation; Multi-model fusion; Remote photo-plethysmography; Residual network.