Validation Study on Automated Sleep Stage Scoring Using a Deep Learning Algorithm

Jae Hoon Cho; Ji Ho Choi; Ji Eun Moon; Young Jun Lee; Ho Dong Lee; Tae Kyoung Ha

doi:10.3390/medicina58060779

Validation Study on Automated Sleep Stage Scoring Using a Deep Learning Algorithm

Medicina (Kaunas). 2022 Jun 9;58(6):779. doi: 10.3390/medicina58060779.

Authors

Jae Hoon Cho¹, Ji Ho Choi², Ji Eun Moon³, Young Jun Lee⁴, Ho Dong Lee⁴, Tae Kyoung Ha⁴

Affiliations

¹ Department of Otorhinolaryngology-Head and Neck Surgery, Konkuk University School of Medicine, 120-1, Neungdong-ro, Gwangjin-gu, Seoul 05030, Korea.
² Department of Otorhinolaryngology-Head and Neck Surgery, Soonchunhyang University College of Medicine, Bucheon Hospital, 170, Jomaru-ro, Bucheon 14584, Korea.
³ Department of Biostatistics, Clinical Trial Center, Soonchunhyang University Bucheon Hospital, 170, Jomaru-ro, Bucheon 14584, Korea.
⁴ Honeynaps Research and Development Center, Honeynaps Co., Ltd., 4F, 529, Nonhyeon-ro, Gangnam-gu, Seoul 06126, Korea.

Abstract

Background and Objectives: Polysomnography is manually scored by sleep experts. However, manual scoring is a time-consuming and labor-intensive task. The goal of this study was to verify the accuracy of automated sleep-stage scoring based on a deep learning algorithm compared to manual sleep-stage scoring. Materials and Methods: A total of 602 polysomnography datasets from subjects (Male:Female = 397:205) aged 19 to 65 years (mean age, 43.8, standard deviation = 12.2) were included in the study. The performance of the proposed model was evaluated based on kappa value and bootstrapped point-estimate of median percent agreement with a 95% bootstrap confidence interval and R = 1000. The proposed model was trained using 482 datasets and validated using 48 datasets. For testing, 72 datasets were selected randomly. Results: The proposed model exhibited good concordance rates with manual scoring for stages W (94%), N1 (83.9%), N2 (89%), N3 (92%), and R (93%). The average kappa value was 0.84. For the bootstrap method, high overall agreement between the automated deep learning algorithm and manual scoring was observed in stages W (98%), N1 (94%), N2 (92%), N3 (99%), and R (98%) and total (96%). Conclusions: Automated sleep-stage scoring using the proposed model may be a reliable method for sleep-stage classification.

Keywords: algorithms; deep learning; polysomnography; sleep stages.

MeSH terms

Adult
Algorithms
Deep Learning*
Female
Humans
Male
Observer Variation
Reproducibility of Results
Sleep
Sleep Stages

Abstract

MeSH terms

Grants and funding