Variational Autoencoders for Biomedical Signal Morphology Clustering and Noise Detection

IEEE J Biomed Health Inform. 2023 Sep 28:PP:10.1109/JBHI.2023.3320585. doi: 10.1109/JBHI.2023.3320585. Online ahead of print.

Abstract

Accurate estimation of physiological biomarkers using raw waveform data from non-invasive wearable devices requires extensive data preprocessing. An automatic noise detection method in time-series data would offer significant utility for various domains. As data labeling is onerous, having a minimally supervised abnormality detection method for input data, as well as an estimation of the severity of the signal corruptness, is essential. We propose a model-free, time-series biomedical waveform noise detection framework using a Variational Autoencoder coupled with Gaussian Mixture Models, which can detect a range of waveform abnormalities without annotation, providing a confidence metric for each segment. Our technique operates on biomedical signals that exhibit periodicity of heart activities. This framework can be applied to any machine learning or deep learning model as an initial signal validator component. Moreover, the confidence score generated by the proposed framework can be incorporated into different models' optimization to construct confidence-aware modeling. We conduct experiments using dynamic time warping (DTW) distance of segments to validated cardiac cycle morphology. The result confirms that our approach removes noisy cardiac cycles and the remaining signals, classified as clean, exhibit a 59.92% reduction in the standard deviation of DTW distances. Using a dataset of bio-impedance data of 97885 cardiac cycles, we further demonstrate a significant improvement in the downstream task of cuffless blood pressure estimation, with an average reduction of 2.67 mmHg root mean square error (RMSE) of Diastolic Blood pressure and 2.13 mmHg RMSE of systolic blood pressure, with increases of average Pearson correlation of 0.28 and 0.08, with a statistically significant improvement of signal-to-noise ratio respectively in the presence of different synthetic noise sources. This enables burden-free validation of wearable sensor data for downstream biomedical applications.