Learning Representations from Heart Sound: A Comparative Study on Shallow and Deep Models

Kun Qian; Zhihao Bao; Zhonghao Zhao; Tomoya Koike; Fengquan Dong; Maximilian Schmitt; Qunxi Dong; Jian Shen; Weipeng Jiang; Yajuan Jiang; Bo Dong; Zhenyu Dai; Bin Hu; Björn W Schuller; Yoshiharu Yamamoto

doi:10.34133/cbsystems.0075

Learning Representations from Heart Sound: A Comparative Study on Shallow and Deep Models

Cyborg Bionic Syst. 2024 Mar 4:5:0075. doi: 10.34133/cbsystems.0075. eCollection 2024.

Authors

Kun Qian^{1

2}, Zhihao Bao^{1

2}, Zhonghao Zhao^{1

2}, Tomoya Koike³, Fengquan Dong⁴, Maximilian Schmitt⁵, Qunxi Dong^{1

2}, Jian Shen^{1

2}, Weipeng Jiang⁴, Yajuan Jiang⁴, Bo Dong⁴, Zhenyu Dai⁶, Bin Hu^{1

2}, Björn W Schuller^{5

7}, Yoshiharu Yamamoto³

Affiliations

¹ Key Laboratory of Brain Health Intelligent Evaluation and Intervention, Ministry of Education (Beijing Institute of Technology), Beijing 100081, China.
² School of Medical Technology, Beijing Institute of Technology, Beijing 100081, China.
³ Educational Physiology Laboratory, Graduate School of Education, The University of Tokyo, Tokyo 113-0033, Japan.
⁴ Department of Cardiology, Shenzhen University General Hospital, Shenzhen 518055, China.
⁵ CHI - Chair of Health Informatics, Technical University of Munich, Munich 81675, Germany.
⁶ Department of Cardiovascular, Wenzhou Medical University First Affiliated Hospital, Wenzhou 325000, China.
⁷ GLAM - Group on Language, Audio & Music, Imperial College London, London SW7 2AZ, UK.

Abstract

Leveraging the power of artificial intelligence to facilitate an automatic analysis and monitoring of heart sounds has increasingly attracted tremendous efforts in the past decade. Nevertheless, lacking on standard open-access database made it difficult to maintain a sustainable and comparable research before the first release of the PhysioNet CinC Challenge Dataset. However, inconsistent standards on data collection, annotation, and partition are still restraining a fair and efficient comparison between different works. To this line, we introduced and benchmarked a first version of the Heart Sounds Shenzhen (HSS) corpus. Motivated and inspired by the previous works based on HSS, we redefined the tasks and make a comprehensive investigation on shallow and deep models in this study. First, we segmented the heart sound recording into shorter recordings (10 s), which makes it more similar to the human auscultation case. Second, we redefined the classification tasks. Besides using the 3 class categories (normal, moderate, and mild/severe) adopted in HSS, we added a binary classification task in this study, i.e., normal and abnormal. In this work, we provided detailed benchmarks based on both the classic machine learning and the state-of-the-art deep learning technologies, which are reproducible by using open-source toolkits. Last but not least, we analyzed the feature contributions of best performance achieved by the benchmark to make the results more convincing and interpretable.