Introduction
Sleep is important for humans [1]. Different sleep stages, such as non-rapid eye movement (NREM) and rapid eye movement (REM), are essential for memory consolidation, attention improvement, emotion regulation, and so forth [2], [3]. Accurately classifying the sleep stages is indispensable for comprehending how the sleep impacts human physical and mental health. However, manual sleep staging heavily relies on the knowledge and labor of sleep experts. The laboring process is empirical and time-consuming [4], [5]. By contrast, automatic sleep staging is promising to enhance the accuracy and efficiency of sleep analysis [6], [7].
Sleep staging refers to distinguishing the stages of human sleep. Sleep specialists generally categorize the sleep stages based on polysomnography (PSG), which consists of EEG, electrooculogram (EOG), electromyogram (EMG), and electrocardiogram (ECG) [8]. This paper focuses on single-channel EEG for sleep staging. Compared with PSG or multi-channel EEG, single-channel EEG holds great practical significance, because it is quite convenient and efficient to collect only one sort of signals via single one channel. Besides, technological improvement of sleep staging based on single-channel EEG is very helpful for enhancing the performance of sleep staging using multi-channel EEG as well as PSD. According to the American Academy of Sleep Medicine (AASM) criteria, PSG data can be divided into WAKE, REM, and NREM. NREM can further be classified into N1, N2 and N3 stages. In different sleep stages, the EEG signals display different waveforms, amplitudes, and spectra [9]. For instance, the salient waves of REM stage are sawtooth waves, but the salient waves of N2 stage are sleep spindles or K-complexes [10]. Capturing the characteristics of signal wave patterns can be beneficial for sleep stage classification. Moreover, sleep transition rules are also informative to distinguish sleep stages, especially those between the neighboring sleep stages, such as W-N1-N1-W-N1-N1, N2-N2-N3-N2-N3, N2-N2-REM, etc.
Many researchers have recommended deep learning for sleep staging based on EEG. Typical methods include convolutional neural network (CNN) [11], [12], convolutional recurrent neural network (CRNN) [13], [14], fully convolutional network (FCN) [15], etc. Early methodology relies on the one-to-one scheme in which an EEG epoch corresponds to one sleep stage [16]. Generally, EEG signal waves of different sleep stages display distinctive temporal and spectral characteristics. For example, K-complexes occur approximately every 1.0-1.7 minutes, but alpha rhythm undergoes periodic oscillations with a frequency range of 8 to 12 Hz. Therefore, multi-scale feature extraction plays an important role in sleep staging, because it can capture the different characteristics of salient EEG waves. Eldele et al. [17] designed two parallel CNNs, which utilize small and large filters to learn the representations from EEG saline waves for classifying sleep stages. Wang et al. [18] employed the attention mechanism and the multi-scale convolution to extract the salient wave features from EEG to classify sleep stages. Although CNN models have shown inspiring performance in sleep stage classification, their one-to-one scheme ignores the important sleep transition rules between neighboring sleep stages.
In recent years, both many-to-one and sequence-to-sequence schemes, which rely on multiple EEG epochs for sleep staging, have attracted increasing research interests [11], [12], [14]. These two schemes take into account the transition patterns of neighboring sleep stages and thus achieves encouraging performances [19]. Dong et al. [20] put forward a rectifier neural network to learn the hierarchical features from EEG epochs and adopted long short-term memory (LSTM) to recognize sleep stages. Seo et al. [21] brought forward intra- and inter-epoch temporal context network (IITNet), which is composed of a deep residual network and two layers of bi-directional LSTM (BiLSTM), to extract the time-invariant features from single-channel EEG epochs and learn the sleep transition rules for distinguishing sleep stages. Phan et al. [22] came up with SleepTransformer which extracts the intra-epoch features from each 30-second EEG epoch and learns the inter-epoch temporal representation from these epoch-wise features to separate sleep stages.
Nevertheless, because humans have different sleep durations at different stages, the number of signal samples in each sleep stage is usually unequal. Therefore, we need to address such a class imbalance problem for sleep staging [12], [23]. Recently, some studies suggest using data augmentation to balance the class distribution for sleep datasets [24], [25]. Data augmentation approaches usually generate the synthetic samples of minority classes from existing samples at the expense of computational time. Other studies recommend applying cost-sensitive learning to penalize the misclassification of minority classes, which, however, will sacrifice the classification rate on the majority classes as a cost [17].
In this paper, we propose a novel and effective method named SleepFC for sleep staging based on single-channel EEG. The main contributions of SleepFC are summarized as follows.
The proposed SleepFC has a new architecture, which consists of convolutional feature pyramid network (CFPN), cross-scale temporal context learning (CSTCL), and class adaptive fine-tuning loss function (CAFTLF) based classification network, as illustrated in Fig. 1.
In SleepFC, CFPN takes charge of learning a feature pyramid of salient waves; CSTCL is responsible for capturing the multi-scale sleep transition rules between successive sleep stages; CAFTLF-based classification network plays the role in resolving the class imbalance problem for sleep staging, without causing extra computational expense or compromising the classification rate on the majority classes.
Extensive experiments on three public benchmark datasets demonstrate the superiority of SleepFC over the related state-of-the-arts for sleep staging based on single-channel EEG.
Overall architecture of SleepFC. At first, CFPN extracts the multi-scale features of salient waves from successive EEG epochs. Then, CSTCL learns to capture sleep stage transition rules from the extracted multi-scale features. In more detail, CSTCL fuses the multi-scale features by SCT, TDCL and BUCL, and encodes the temporal context information of fused features via Transformer encoder. At last, CAFTLF-based classification network distinguishes the imbalanced classes of sleep stages.
Method
As shown in Fig. 1, the proposed SleepFC is comprised of three components: CFPN, CSTCL, and CAFTFL-based classification network. The algorithmic procedures of SleepFC are briefly described as follows. At first, CFPN learns the multi-scale features of salient waves from successive EEG epochs. Then, CSTCL captures the sleep stage transition rules from the multi-scale features. At last, CAFTLF-based classification network predicts the sleep stages whilst tackling the class imbalance problem.
A. Preliminary
We denote L successive single-channel EEG epochs sampled at F Hz as
B. Convolutional Feature Pyramid Network
To characterize the intrinsic features of salient waves from EEG signals, CFPN learns the feature pyramid by means of convolutional blocks, max-pooling layers, and convolutional layers.
The feature pyramid consists of three feature maps
C. Cross-Scale Temporal Context Learning
To learn the EEG features for sleep staging, CSTCL captures the multi-scale sleep transition rules by integrating three context learning approaches and one transformer encoder.
The sleep transition rules have multi-scale characteristics according to the AASM criteria (i.e., short scale: N2-REM; middle scale: N3-N1-N1-N3; long scale: N2-N1-N1-W-N1-W-W; here, “-” means the sleep stage transiting from one to another) [31]. CSTCL learns to capture the multi-scale sleep transition rules from feature pyramid. Specifically, CSTCL contains top-down context learning, self-context learning, bottom-up context learning, and Transformer encoder. As the input of CSTCL, the feature pyramid
1) Self-Context Learning:
Self-context learning (SCL) extracts the features of salient waves from EEG in different sleep stages, and learns the contextual relationship along the temporal dimension of these features. The output \begin{align*} \mathbf {W}_{S} & =\sum _{j=1}^{M} \pi _{j} \sigma _{1}\left ({{\frac {\mathbf {Q}_{j}\mathbf {K}_{j}^{\top }}{\sqrt {d_{k}} }}}\right), \\ \left [{{ \pi _{1},\pi _{2}, {\dots },\pi _{M} }}\right ] & =\sigma _{2} (\mathrm {w}_{\mathrm {mos},j}^{\top }\bar {\mathbf {K}}), \\ \mathbf {Q}_{j} & =\mathbf {K}_{j} =f_{\mathrm {QKS,}j}(\mathbf F), \\ \mathbf {V} & =f_{\mathrm {VS} }(\mathbf F), \tag {1}\end{align*}
\begin{equation*} \tilde {\mathbf {F}}=\mathrm {BN}(\mathbf {W}_{S}\mathbf {V})+\mathbf {F}, \tag {2}\end{equation*}
2) Top-Down Context Learning:
Top-down context learning (TDCL) adopts a top-down attention mechanism. This mechanism fuses the global information of high-level feature map
The method pipeline of TDCL is briefly described in the following. First, we apply three convolutional layers
Next, we calculate the dot product between Q and \begin{align*} \tilde {\mathbf {F}}_{l}& =\mathrm {Conv}_{T}\left ({{\frac {\mathbf {Q}\mathbf {K}^{\top }}{ d_{t,h}}\mathbf {V} }}\right), \\ \mathbf {Q}& =f_{\mathrm {QT} }(\mathbf F_{l}), \\ \mathbf {K}& =f_{\mathrm {KT} }(\mathbf F_{h}), \\ \mathbf {V}& =f_{\mathrm {VT} }(\mathbf F_{h}), \tag {3}\end{align*}
3) Bottom-Up Context Learning:
Bottom-up context learning (BUCL) fuses the local information of \begin{align*} \mathbf {Q}& =f_{\mathrm {QB} }(\mathbf F_{h}), \\ \mathbf {K}& =f_{\mathrm {KB} }(\mathbf F_{l}), \\ \mathbf {V}& =f_{\mathrm {VB} }(\mathbf F_{l}), \tag {4}\end{align*}
Next, we process the low-level feature map K by the channel-wise attention operation:\begin{equation*} \mathbf {w}_{c}=\mathrm {ReLU}(\mathrm {GAP} (\mathbf {K})), \tag {5}\end{equation*}
Then, we calculate the Hadamard product between \begin{align*} \tilde {\mathbf {F}}_{h}& =\mathrm {ReLU}(\mathbf {Q}\odot \mathbf {W}_{c} + \mathbf {V}), \\ \mathbf {W}_{c}& =[\mathbf {w}_{c},\mathbf {w}_{c}, {\dots },\mathbf {w}_{c}]_{1\times d_{k}}, \tag {6}\end{align*}
4) Transformer Encoder:
By performing SCT, TDCT and BUCL on the feature pyramid \begin{align*} {\tilde {\mathbf {F}}}_{3}^{(L)}& =\mathrm {ConV} (\mathrm {Concat}({\mathbf {F}}_{3}^{(L)},\tilde {\mathbf {F}}_{l(5,3)}^{(L)},\tilde {\mathbf {F}}_{l(4,3)}^{(L)},\tilde {\mathbf {F}}_{(3,3)}^{(L)})), \\ {\tilde {\mathbf {F}}}_{4}^{(L)}& =\mathrm {ConV} (\mathrm {Concat}({\mathbf {F}}_{4}^{(L)},\tilde {\mathbf {F}}_{l(5,4)}^{(L)},\tilde {\mathbf {F}}_{(4,4)}^{(L)},\tilde {\mathbf {F}}_{h(3,4)}^{(L)})), \\ {\tilde {\mathbf {F}}}_{5}^{(L)}& =\mathrm {ConV} (\mathrm {Concat} ({\mathbf {F}}_{5}^{(L)},\tilde {\mathbf {F}}_{(5,5)}^{(L)},\tilde {\mathbf {F}}_{h(4,5)}^{(L)},\tilde {\mathbf {F}}_{h(3,5)}^{(L)})), \tag {7}\end{align*}
Finally, we encode the context information of temporal sequence \begin{align*} {\mathbf {E}}_{i}^{(L)}& = \mathrm {TransformerEncoder} ({\tilde {\mathbf {P}}}_{i}^{(L)}), \\ {\tilde {\mathbf {P}}}_{i}^{(L)}& ={\tilde {\mathbf {F}}}_{i}^{(L)}+{\mathbf {P}}_{i}^{(L)}, \tag {8}\end{align*}
D. CAFTFL-Based Classification Network
CAFTFL-based classification network predicts the sleep stages whilst handling the class imbalance via the attention mechanism and the loss function CAFTLF in a two-stage training process.
In CAFTFL-based classification network, we fuse the encoded feature map \begin{equation*} \tilde {\mathbf {e}} _{i} = \sum _{t=1}^{T} \alpha _{i,t}\mathbf {a}_{i,t}, \tag {9}\end{equation*}
\begin{align*} \mathbf {a}_{i,t} & =\tanh \left ({{\mathbf {W} {\mathbf {e}}_{i,t}^{(L)}+\mathbf {b}}}\right), \\ \alpha _{i,t} & =\frac {\exp \left ({{\mathbf {a}_{i,t}^{\top } \mathbf {w}_{\alpha }}}\right)}{\sum _{t=1}^{T} \exp \left ({{\mathbf {a}_{i,t}^{\top } \mathbf {w}_{\alpha }}}\right)}, \tag {10}\end{align*}
Finally, the \begin{equation*} \hat {y}=\mathrm {argmax}\left ({{\sum _{i \in \{3,4,5\}} \mathbf {O}_{i}}}\right), \tag {11}\end{equation*}
In the classification network, we employ a piecewise loss function to address the class imbalance problem of sleep stages. The training process of SleepFC consists of two stages. In the first stage, we utilize the standard multi-class cross-entropy [34] as the loss function; in the second stage, we devise the loss function CAFTLF as \begin{align*} & {\mathcal {L}}_{\mathrm {CAFTLF}} =-\frac {1}{S} \sum _{\{i\in 3,4,5\}} \sum _{s=1}^{S} \sum _{k=1}^{K} w_{k} y_{i,s}^{k} \log \left ({{\widehat {y}_{i,s}^{k}}}\right), \tag {12}\\ & w_{k} =\begin{cases}\displaystyle 1+\mu _{({y}_{s},\widehat {y}_{s})} \cdot \max \left ({{1, \log \left ({{\frac { S }{ S_{k}} }}\right)}}\right), \mathrm {if } {y}_{s} \neq \widehat {y}_{s} \\ \displaystyle 1, \; \; \qquad \qquad \qquad \qquad \qquad \qquad \hspace {0.6em} \mathrm {if } {y}_{s} = \widehat {y}_{s},\end{cases} \tag {13}\end{align*}
When training SleepFC, an early stopping technique is employed to reduce the overfitting risk and enhance the generalization performance. In the training process, once the validation loss stops decreasing not less than a certain number of training iterations (i.e., the early stopping patience
Experiments
A. Datasets
We evaluate our proposed method, SleepFC, on three public benchmark datasets: SleepEDF-20 [26], SleepEDF-78 [27], and ISRUC-S3 [35], whose critical characteristics have been summarized in Table I.
SleepEDF-20: SleepEDF-20 is comprised of 10 male subjects and 10 female subjects aged from 25 to 34 years old without sleep disorders. Two consecutive nights of PSG recordings were collected from them, except that one recording of subject 13 was lost due to device failure. Based on the Rechtschaffen and Kales criteria [6], sleep experts manually annotated the PSG sleep periods in 30-second sleep epochs and categorized the sleep epochs into eight classes: MOVEMENT, UNKNOWN, WAKE, N1, N2, N3, N4, and REM.
SleepEDF-78: SleepEDF-78 is the Sleep-EDF Expanded dataset (version 2013), consisting of 78 healthy subjects aged from 25 to 101. Each subject underwent two consecutive nights of PSG sleep recordings, except for subjects 13, 36 and 52, whose one recording was lost due to device failure. Every sleep epoch was categorized into the same eight classes as SleepEDF-20.
ISRUC-S3: ISRUC-S3 contains the PSG recordings collected from 10 healthy subjects (9 males and 1 female). The recordings of ISRUC-S3 lasted continually for 8 hours with a sampling frequency of 200 Hz. Each recording includes 6 EEG channels, 1 ECG channel, 3 EMG channels, and 2 EOG channels. According to the criteria of AASM, sleep experts categorized these PSG signals into five sleep stages: WAKE, N1, N2, N3, and REM.
B. Experimental Settings
For each dataset, we use one single channel of original EEG, except for ISRUC-S3 whose signals are downsampled at the frequency of 100 Hz. In experiments, we use the Fpz-Cz channel of EEG from SleepEDF-20 and SleepEDF-78, and the C4-A1 channel of EEG from ISRUC-S3 for method evaluation. The MOVEMENT class refers to the physical activity during sleep. There are also the movement artifacts that cannot be scored in both beginning and end of the recording from each subject. These noisy parts of each recording are labeled as UNKNOWN [36]. Because these two classes don’t represent any specific sleep stage, we exclude them before experiments [6], [31], [37]. Moreover, according to the AASM criteria, we merge N3 and N4 stages into N3 for classification [6], [17], [38], [39]. Besides, we keep 30 minutes of the WAKE periods before and after the sleep period as the WAKE stage [40].
We follow the universally-used evaluation protocols for method evaluation [6], [17], [22]. The evaluation protocols on different datasets have been described in Table I. It is worth mentioning that, in experiments, the validation set is randomly selected from the training set, which is independent of the testing set. Besides, we adopt three metrics to evaluate the method performance: accuracy (ACC), macro F1-score (MF1), and Cohen’s Kappa (
The parameter settings of SleepFC are given in the following. L is set as 10, which means that one current and nine previous adjacent EEG epochs are used as the input data of SleepFC. In each convolutional block of CFPN, for every convolutional layer, the kernel size is set as 3, the stride size is set as 1, and the padding size is set as 1; for every max-pooling layer, the kernel size is set as 5, and the stride size is set as 5. In CFPN, the output channel number
On SleepEDF-20 and SleepEDF-78, SleepFC is evaluated on the validation set every 500 training iterations (i.e., the validation period
C. Feature Evaluation
To evaluate the performance gain brought by CFPN, we compare the feature extraction components of SleepFC, U-Time, XSleepNet, and SleepTransformer with and without the feature pyramid method on SleepEDF-78. This comparison is carried out under the condition where the subsequent components of the feature extraction components are utilizing Transformer Encoder and the CAFTFL-based classification network of SleepFC.
U-Time [15] is a fully convolutional network for sleep staging. U-Time has an encoder-decoder structure for feature extraction, and the encoder is used for feature extraction and the decoder for times series segmentation. In our experiments, we only utilized the encoder component to extract EEG features directly from the raw EEG signal.
XSleepNet [6] is a sequence-to-sequence bidirectional RNN for sleep staging. XSleepNet is composed of two network streams: one for processing raw signals and the other for processing time-frequency images. In our experiments, we only use the former stream to extract features, considering its suitability for EEG.
AttnSleep [17] is an attention-based deep learning approach for sleep staging using single-channel EEG. The feature extraction component of AttnSleep is a multi-resolution convolutional neural network (MRCNN), which is bifurcated into two distinct branches. The low-resolution branch extracts low-frequency features, and the high-resolution branch extracts high-frequency features. The features from the two branches are then concatenated as the extracted features.
From Table II, we can see that, with feature pyramid, the overall ACC, MF1, and
D. Method Comparison
Table III has visualized the confusion matrices of SleepFC for sleep stage classification on SleepEDF-20, SleepEDF-78 and ISRUC-S3. From these confusion matrices, we can observe that the class imbalance problem has a big influence on the performance of SleepFC. Specifically, it is indeed the easiest case to identify the sleep stage W, which belongs to the majority class in the long-tailed distribution, on all the datasets, while, it is also the hardest case to classify the sleep stage N1, which belongs to the minority one at the other end of such a class distribution.
Moreover, we compare our proposed SleepFC with the state-of-the-art approaches in Table IV. We directly report the results of the methods with the input setting of
From Table IV, we can see that SleepFC performs the best for sleep staging in terms of ACC, MF1 and
Furthermore, we measure the model size of SleepFC in Table V. Although the performance advantage of SleepFC over XSleepNet is not so obvious as the compared approaches, yet SleepFC has smaller parameter amount and requires fewer EEG epochs.
E. Model Ablation
We carry out ablation study to validate the rationality and effectivity of the key components CFPN, CSTCL and CAFTLF in SleepFC on SleepEDF-20. The following four experiments are conducted:
Ablation on CFPN: CFPN and CAFTLF-disabled classification network with the first training stage.
Ablation on CFPN+CAFTLF: CFPN and CAFTLF-based classification network with the two-stage training.
Ablation on CFPN+CSTCL: CFPN, CSTCL and CAFTLF-disabled classification network with the first training stage.
Ablation on CFPN+CSTCL+CAFTLF: CFPN, CSTCL and CAFTLF-based classification network with the two-stage training, i.e., SleepFC.
From Fig. 2, we can see that CSTCL avails SleepFC of capturing the informative multi-scale transition rules between sleep stages, thus boosting the performance of SleepFC. These results reveal the value of this context learning component in SleepFC for sleep staging. By comparing CFPN and CFPN+CAFTLF as well as comparing CFPN+CAFTLF and CFPN+CSTCL+CAFTLF, we can find that CAFTLF-based classification network not only can enhance the overall ACC, MF1, and
F. Sensitivity Analysis
1) Evaluation on the Number of EEG Epochs:
We evaluate the influence of the number of EEG epochs, denoted as L, on the performance of SleepFC, by adjusting L as 1, 2, 5, 10 and 20. Our results under three different evaluation metrics, as illustrated in Fig. 3, reveal that SleepFC achieves its peak performance on SleepEDF-20, SleepEDF-78 and ISRUC-S3 when L is set as 10. By contrast, both increase and decrease of L result in a performance decline of SleepFC. This is mainly because the lower values of L cannot offer sufficient temporal context information for CSTCL of SleepFC to learn discriminative feature maps
Evaluation on the number of EEG epochs as the input for SleepFC using SleepEDF-20, SleepEDF-78 and ISRUC-S3: (a) the results of SleepFC under ACC; (b) the results of SleepFC under MF1; (c) the results of SleepFC under
2) Evaluation on the Convolution Kernel Size of CFPN:
We evaluate the influence of the convolution kernel size of CFPN, denoted as K, on the performance of SleepFC, by adjusting K from 1 to 9. By observing the results of SleepFC under three different evaluation metrics on ISRUC-S3 in Fig. 4, we can find that the performance of SleepFC fluctuates with the increase of K, resulting in more than one peak. The reason to explain this phenomenon is as follows. For a sleep stage, the larger convolution kernel size enables CFPN to encode the rich and varied information, thus being beneficial for CFPN to learn more robust features at the expense of discriminability; the smaller convolution kernel size enables CFPN to encode the detailed and typical information, thus being conducive to CFPN to learn more discriminative features at the cost of robustness. To ensure a good generalization performance, CFPN should balance both discriminability and robustness in feature learning. Moreover, the signal data in different sleep stages have different characteristics, commensurate with different kernel sizes of CFPN to learn the features with the strongest generalizability. Therefore, SleepFC exhibits a fluctuating performance as K increases. However, as shown in Fig. 4(d), the larger kernel size also means the more model parameters and computational complexity of SleepFC. Considering this, we recommend
Evaluation on the convolution kernel size of CFPN in SleepFC using ISRUC-S3: (a) the results of SleepFC under ACC; (b) the results of SleepFC under MF1; (c) the results of SleepFC under
3) Evaluation on the Concatenation Order of Feature Maps:
We evaluate the concatenation order of feature maps on ISRUC-S3, including
Evaluation on the in SleepFC using ISRUC-S3: (a) the results of SleepFC under ACC; (b) the results of SleepFC under MF1; (c) the results of SleepFC under
4) Evaluation on the Scale of Learned Representation:
We evaluate the effectivity of each scale of learned representation output from SleepFC on ISRUC-S3. For convenience, we denote the three scales of learn representations as
G. Significance Test
We evaluate the statistical significance of the performance improvement of SleepFC over the three related advanced methods AttnSleep, DeepSleepNet, and XSleepNet by means of the paired Wilcoxon signed-rank test. To be specific, we assess the p-values for ACC improvement, MF1 improvement, and
As recorded in Table VII, in almost all the cases, SleepFC has obvious performance improvements over the compared approaches, and the p-values for these improvements are much lower than the significance level of 0.05. Such results straightforwardly evidence the statistical significance of the performance superiority of SleepFC for the task of sleep staging.
Conclusion
In this paper, we have proposed a novel method SleepFC for the issue of single-EEG-based sleep staging. SleepFC not only can effectively extract and fuse the representative features from the salient waves of EEG epochs, but can learn and capture the informative multi-scale sleep transition rules among sleep stages, and also can competently tackle the serious class imbalance problem ever haunting this issue. Experimental results on three public benchmark datasets have demonstrated the superiority of proposed method over the related state-of-the-arts. In future, we will tentatively incorporate an appropriate transfer learning strategy into SleepFC to handle the thorny problem of cross-subject domain gap, so as to further enhance the performance of our model for this issue.