End-to-End Sleep Staging Using Nocturnal Sounds from Microphone Chips for Mobile Devices

Joonki Hong; Hai Hong Tran; Jinhwan Jung; Hyeryung Jang; Dongheon Lee; In-Young Yoon; Jung Kyung Hong; Jeong-Whun Kim

doi:10.2147/NSS.S361270

End-to-End Sleep Staging Using Nocturnal Sounds from Microphone Chips for Mobile Devices

Nat Sci Sleep. 2022 Jun 25:14:1187-1201. doi: 10.2147/NSS.S361270. eCollection 2022.

Authors

Joonki Hong^{1

2}, Hai Hong Tran¹, Jinhwan Jung¹, Hyeryung Jang³, Dongheon Lee¹, In-Young Yoon^{4

5}, Jung Kyung Hong^#^{4

5}, Jeong-Whun Kim^#^{5

6}

Affiliations

¹ Asleep Inc., Seoul, Korea.
² Korea Advanced Institute of Science and Technology, Daejeon, Korea.
³ Dongguk University, Seoul, Korea.
⁴ Department of Psychiatry, Seoul National University Bundang Hospital, Seongnam, Korea.
⁵ Seoul National University College of Medicine, Seoul, Korea.
⁶ Department of Otorhinolaryngology, Seoul National University Bundang Hospital, Seongnam, Korea.

^# Contributed equally.

Abstract

Purpose: Nocturnal sounds contain numerous information and are easily obtainable by a non-contact manner. Sleep staging using nocturnal sounds recorded from common mobile devices may allow daily at-home sleep tracking. The objective of this study is to introduce an end-to-end (sound-to-sleep stages) deep learning model for sound-based sleep staging designed to work with audio from microphone chips, which are essential in mobile devices such as modern smartphones.

Patients and methods: Two different audio datasets were used: audio data routinely recorded by a solitary microphone chip during polysomnography (PSG dataset, N=1154) and audio data recorded by a smartphone (smartphone dataset, N=327). The audio was converted into Mel spectrogram to detect latent temporal frequency patterns of breathing and body movement from ambient noise. The proposed neural network model learns to first extract features from each 30-second epoch and then analyze inter-epoch relationships of extracted features to finally classify the epochs into sleep stages.

Results: Our model achieved 70% epoch-by-epoch agreement for 4-class (wake, light, deep, REM) sleep stage classification and robust performance across various signal-to-noise conditions. The model performance was not considerably affected by sleep apnea or periodic limb movement. External validation with smartphone dataset also showed 68% epoch-by-epoch agreement.

Conclusion: The proposed end-to-end deep learning model shows potential of low-quality sounds recorded from microphone chips to be utilized for sleep staging. Future study using nocturnal sounds recorded from mobile devices at home environment may further confirm the use of mobile device recording as an at-home sleep tracker.

Keywords: deep learning; polysomnography; respiratory sounds; sleep stages; smartphone.