An automatic method using MFCC features for sleep stage classification

Brain Inform. 2024 Feb 10;11(1):6. doi: 10.1186/s40708-024-00219-w.

Abstract

Sleep stage classification is a necessary step for diagnosing sleep disorders. Generally, experts use traditional methods based on every 30 seconds (s) of the biological signals, such as electrooculograms (EOGs), electrocardiograms (ECGs), electromyograms (EMGs), and electroencephalograms (EEGs), to classify sleep stages. Recently, various state-of-the-art approaches based on a deep learning model have been demonstrated to have efficient and accurate outcomes in sleep stage classification. In this paper, a novel deep convolutional neural network (CNN) combined with a long short-time memory (LSTM) model is proposed for sleep scoring tasks. A key frequency domain feature named Mel-frequency Cepstral Coefficient (MFCC) is extracted from EEG and EMG signals. The proposed method can learn features from frequency domains on different bio-signal channels. It firstly extracts the MFCC features from multi-channel signals, and then inputs them to several convolutional layers and an LSTM layer. Secondly, the learned representations are fed to a fully connected layer and a softmax classifier for sleep stage classification. The experiments are conducted on two widely used sleep datasets, Sleep Heart Health Study (SHHS) and Vincent's University Hospital/University College Dublin Sleep Apnoea (UCDDB) to test the effectiveness of the method. The results of this study indicate that the model can perform well in the classification of sleep stages using the features of the 2-dimensional (2D) MFCC feature. The advantage of using the feature is that it can be used to input a two-dimensional data stream, which can be used to retain information about each sleep stage. Using 2D data streams can reduce the time it takes to retrieve the data from the one-dimensional stream. Another advantage of this method is that it eliminates the need for deep layers, which can help improve the performance of the model. For instance, by reducing the number of layers, our seven layers of the model structure takes around 400 s to train and test 100 subjects in the SHHS1 dataset. Its best accuracy and Cohen's kappa are 82.35% and 0.75 for the SHHS dataset, and 73.07% and 0.63 for the UCDDB dataset, respectively.

Keywords: Convolutional neural network; Long short-term memory; Mel-frequency cepstral coefficients; Sleep stages.