Narcolepsy Diagnosis With Sleep Stage Features Using PSG Recordings

Narcolepsy is a sleep disorder affecting millions of people worldwide and causes serious public health problems. It is hard for doctors to correctly and objectively diagnose narcolepsy. Polysomnography (PSG) recordings, a gold standard for sleep monitoring and quality measurement, can provide abundant and objective cues for the narcolepsy diagnosis. There have been some studies on automatic narcolepsy diagnosis using PSG recordings. However, the sleep stage information, an important cue for narcolepsy diagnosis, has not been fully utilized. For example, some studies have not considered the sleep stage information to diagnose narcolepsy. Although some studies consider the sleep stage information, the stages are manually scored by experts, which is time-consuming and subjective. And the framework using sleep stages scored automatically for narcolepsy diagnosis is designed in a two-phase learning manner, where sleep staging in the first phase and diagnosis in the second phase, causing cumulative error and degrading the performance. To address these challenges, we propose a novel end-to-end framework for automatic narcolepsy diagnosis using PSG recordings. In particular, adopting the idea of multi-task learning, we take the sleep staging as our auxiliary task, and then combine the sleep stage related features with narcolepsy related features for our primary task of narcolepsy diagnosis. We collected a dataset of PSG recordings from 77 participants and evaluated our framework on it. Both of the sleep stage features and the end-to-end fashion contribute to diagnosis performance. Moreover, we do a comprehensive analysis on the relationship between sleep stages and narcolepsy, correlation of different channels, predictive ability of different sensing data, and diagnosis results in subject level.


I. INTRODUCTION
S LEEP plays a critical role in promoting mental and physical health [1], [2].Problems with the quality, timing and amount of sleep severely interfere with normal physical, mental, social and emotional functioning.Such problems are brought about by sleep disorders which affect millions of people worldwide and cause serious public health problems [3].There are about 50 to 70 millions people in America suffering from a chronic sleep or wakefulness disorder [4], such as narcolepsy, insomnia, restless legs syndrome, and sleep apnea.Among the disorders, narcolepsy, characterized by excessive daytime sleepiness and brief episodes of involuntary sleep, may severely interfere with work or social commitments in daily life [1].Patients suffer from the sudden onset and irresistible urges to sleep.Meantime, about 70% of patients affected also experience episodes of sudden loss of muscle strength, known as cataplexy [5].Moreover, narcolepsy tends to happen among relatively young people, for which 15 and 36 years of age is the peak time periods [6].It is extremely harmful to young people's physical and mental health and even leads to a variety of complications, such as depression, mania, bipolar disorder and schizophrenia.Given the fact that narcolepsy has great harm, it is critical to diagnose narcolepsy, so as to timely protect mental and physical health.
Early diagnosis of narcolepsy is typically based on the presented symptoms.In clinical practice, doctors usually determine subjectively whether one has narcolepsy by asking the patient through direct inquiries or questionnaires.In this way, This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.misdiagnosis may be caused, because narcolepsy and other sleep disorders have some similar symptoms and patients may not describe their symptoms accurately and objectively enough.In fact, since people with narcolepsy are often misdiagnosed with other conditions, such as psychiatric disorders or emotional problems, it can take years for someone to get the proper diagnosis [6].Due to the difficulty in diagnosing narcolepsy, a comprehensive, objective, and high-quality manner is urgently needed to help diagnose narcolepsy.With the development of biomedical engineering and sleep medicine, polysomnography (PSG) in hospitals or sleep centers has become the most effective way to understand the sleep status of subjects.The PSG consists of electroencephalogram (EEG), electrooculogram (EOG), electromyography (EMG), and other physiological signals (e.g., electrocardiogram (ECG), Nasal pressure, and body position).PSG recordings are typically segmented into epochs of 30-second duration, each of which is manually assigned a sleep stage by an expert or technician.This process of sleep staging follows the rule of the American Academy of Sleep Medicine (AASM) sleep standard [7], which defines five different sleep stages: Wake (W), rapid eye movement (REM), and three types of non-REM sleep (N1, N2, N3).A real example of PSG signals in our used dataset is given in Fig. 1.Here, EEG, EOG and EMG signals in each sleep stage are presented.Given the richness and objectivity of sensing recordings, PSG has been considered the gold standard for sleep monitoring [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19].Some studies [20], [21], [22], [23] have adopted traditional machine learning methods for automatic sleep staging from PSG.Some studies [11], [14], [24], [25], [26], [27], [28] have proposed deep learning models to predict sleep stages from PSG.It provides abundant and objective cues for narcolepsy diagnosis.
There have been some studies using PSG signals for sleep disorder diagnosis [29], [30], [31], including narcolepsy.Some studies [30], [32] extracted hand-crafted features from PSG recordings, and then fed them into traditional machine learning classifiers (e.g., random forest) for identifying narcolepsy.The sequential relationship within each epoch and between epochs is missed, which is important due to the sequential nature of sleep.With the development of deep learning, the deep neural network is used for narcolepsy diagnosis.However, there are still some limitations that the sleep stage information, an important cue for narcolepsy diagnosis, has not been fully utilized.For example, some studies have not considered the sleep stage information to diagnose narcolepsy.The difference of sleep stage label could be potential biomarkers to classify narcolepsy [33], and the PSG signals belonging to different sleep stages have different performance on narcolepsy diagnosis [31].However, some studies ignore such information, directly using PSG recordings for narcolepsy diagnosis [30], [34].Although the sleep stage information is considered for diagnosis by some studies, the sleep stages are manually scored by experts, which is time-consuming and requires incredible amount of human labor.In some previous studies, the sleep stage labels are first manually assigned, and then combined with PSG recordings for narcolepsy diagnosis [31], [33], [35].Besides, although some studies scored sleep stages automatically, the methods for narcolepsy diagnosis are designed in a two-phase learning manner, and then they are combined with PSG signals for diagnosis in the second phase [36].The sleep stages scored in the first phase contain incorrect labels which could not be well optimized in the second phase, causing cumulative error and degrading the performance for the disorder diagnosis.
In order to address the limitations mentioned above, we propose a novel end-to-end framework for narcolepsy diagnosis from PSG signals.We automatically score the sleep stages, and then take advantages of them for narcolepsy diagnosis, by adopting the idea of multi-task learning [37].To evaluate the framework, we collected a dataset of PSG recordings in our cooperated hospital, consisting of 50 narcolepsy patients and 27 people without disabilities.For convenience, we will later call narcolepsy patients as "patients", and people without disabilities as "normals".Compared with other approaches, our framework achieves the state-of-the-art performance.Our contributions are as follows: • Considering that PSG recordings are the gold standard for sleep monitoring, we collected a dataset of PSG recordings in the cooperated hospital from 50 narcolepsy and 27 healthy people to analyze the relationship between sleep stages and narcolepsy and evaluate our method.
In the future, we will release the dataset.
• We design a novel end-to-end framework for automatically diagnosing narcolepsy from PSG recordings by adopting the idea of multi-task learning and setting sleep staging as auxiliary task.Experimental results show that both of the sleep stage related features and the end-toend fashion significantly contribute to the performance of narcolepsy diagnosis.In the collection procedure, each participant was asked to be in a special ward in the hospital.Before collection, we first need to place the multiple sensors to each participant's body.The technician put more than 20 wired attachments, including the pulse oximeter, pressure transducer, thermocouple, and electrodes on different positions of the subject's body (such as head, eyes, nose, chin, and leg).After that, each subject lies in the bed and falls asleep gradually.The wired attachments begin to collect physical signal from different parts of the subject's body.The PSG recordings were collected according to the AASM sleep standard [7].During the collection process, EEG, EOG, ECG, Chin EMG and Leg EMG signals were sampled at 512Hz which can capture the fine-grained information for these signals.For each subject, we collected her/his PSG recordings for one whole night, from about 21:00 to 5:00 the next morning, about 8 hours in total.All signals were stored using standard EDF+ data formats with .edfextension.The recordings were segmented into epochs of 30 seconds, and then each epoch was manually labeled as a sleep stage by sleep experts or technician according to AASM [7], including Wake, N1, N2, N3, REM, MOVEMENT, and UNKNOWN.
To ensure a fair comparison, we initially performed preprocessing on the datasets, and subsequently evaluated all the methods using the same prepared datasets.Some signals such as EEG, EOG, Chin EMG, and ECG were band-pass filtered and notch filtered.In subsequent experiments, we removed the epochs annotated as MOVEMENT or UNKNOWN.

B. Dataset Analysis
In order to give a better understanding of our SSND dataset, we analyze it from different perspectives.The statistical results of REM stage are consistent with the previous discovery that patients with narcolepsy typically have higher REM sleep density than normals [38].
To further analyze the relationship between sleep stage distribution and narcolepsy, we conducted a significance test on the number of epochs in each sleep stage and whether one subject is a patient or normal, shown in Fig. 2. Here, the "p" value is an indicator of the difference between the patients and the normals on each stage.The "Sig" is an indicator of significance.From Fig. 2 we can see, p values in Wake, N1, N2, N3 and REM stages are repectively 0.1545, 0.0609, 0.5783, 0.0078 and 0.0012."Sig:ns" denotes p≥0.05, indicating there no significant difference between patients and normals."Sig:**" denotes p≤0.01, which indicates there a significant difference between patients and normals.Obviously, compared with other stages, the differences in N3 and REM stage between patients and normals are more significant (p=0.0078 in N3 stage and p=0.0012 in REM stage).This result indicates that patients with narcolepsy are more likely to enter the N3 and REM stages than normals.
2) Hypnogram Analysis: To further analyze the relationship between sleep stage and narcolepsy, we compare two examples of hypnograms manually scored by a sleep expert from two whole-night PSG recordings of a patient and a normal in Fig. 3. Hypnogram is a graph that represents the stages of sleep as a function of time.Hypnograms are usually obtained by scoring the recordings from EEG, EOG and EMG.From Fig. 3(a) we can see that transitions of sleep stages happen frequently in a patient with narcolepsy.On the contrary,   It further proves that the known sleep stages can help diagnose narcolepsy.Therefore, we try to introduce a sleep staging task as an auxiliary task [39] in our deep learning model for narcolepsy diagnosis, which helps compete the primary task of narcolepsy diagnosis and improve the performance.
3) Correlation Analysis of Different Channels: In pervious work, EEG, EOG, EMG and ECG have frequently been given higher importance compared to other signals.Here, we investigate the correlation between different modalities, by calculating the Pearson correlation coefficient between different signals from 13 important channels of EEG, EOG, EMG, and ECG.The heatmaps of Pearson correlation coefficient are shown in Fig. 4. Firstly, the heatmaps of all the subjects, patients and normals are similar in our dataset, which illustrates that overall results of Pearson correlation coefficient on patients and normals are coincident.Then, the values of Pearson correlation coefficient between 6 EEG channels are high, especially the value between F4 and C4.It illustrates that single-channel EEG may achieve the performance similar to that of the fusion of 6 EEG channels.It is worth noting that the value of Pearson correlation coefficient between Chin1-Chin2 EMG and Chin3-Chin2 EMG is high, which shows two Chin EMG channels are similar and single-channel Chin EMG may represent information of two-channels Chin EMG.

A. Problem Formulation
Our model is designed in an end-to-end fashion, which processes a sequence of sleep epochs and outputs a narcolepsy prediction with a sequence of predicted sleep stages.We denote x ∈ R n×C as a sleep epoch, where n is the number of sampling points in a sleep epoch and C is the number of channels.The input sequence of sleep epochs is defined as For automatic sleep staging, we denote the number of sleep stages as N , and N = 5 (Wake, N1, N2, N3, REM), according to the AASM sleep standard [7].We define Ŷ = { ŷ1 , ŷ2 , ŷ3 , . . ., ŷL } as the sequence of sleep stages corresponding to X = {x 1 , x 2 , x 3 , . . ., x L }, where ŷi ∈ {0, 1} N is the one-hot encoding of ground-truth sleep stage of x i .
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.For automatic narcolepsy diagnosis, we denote the number of narcolepsy diagnosis as M, and M = 2 (patient and normal).ẑ ∈ {0, 1} M is defined as an one-hot encoding of ground-truth narcolepsy diagnosis.For a sequence of sleep epochs Therefore, our sleep staging task is defined as learning a mapping function F that maps a sequence of sleep epochs X into the corresponding sequence of sleep stages Ŷ and a narcolepsy diagnosis ẑ.

B. Overview
In order to diagnose narcolepsy from PSG recordings, we design an end-to-end framework, which captures sequential relationship within each epoch and between epochs, automatically scores sleep stages and combines the scored stages with PSG recordings for narcolepsy diagnosis.Specifically, we adopt the idea of multi-task learning, and take the sleep staging as our auxiliary task which contributes to the performance improvement of our primary task: narcolepsy diagnosis.In the auxiliary task, we automatically score the sleep stages and simultaneously learn the sleep stage features that are then combined with narcolepsy features extracted from PSG recordings for the primary task.
Our deep learning model is illustrated in Fig. 5.The model consists of seven modules: (1) [30], [31] show that local salient wave features from each epoch are also helpful for diseases diagnosis, such as narcolepsy.
in our deep learning network, we design an Epoch Feature Extraction Module to extract local feature within each epoch.Epoch Feature Extraction Module consists of Convolutional Neural Network (CNN), Batch Normalization [40], and GELU [41] activation function.Existing studies on sleep staging from PSG [8], [9], [10], [11] have proved that CNN is able to capture the local features of significant waveforms.Therefore, we utilize CNN to extract local features from salient waveforms within each epoch.
We feed sleep sequence X = {x 1 , x 2 , x 3 , . . ., x L } into Epoch Feature Extraction Module.The process is as follows: where X j is the j-th features (X 0 is X ), Conv j is the j-th convolution layer of Epoch Feature Extraction Module, B N is Batch Normalization, G is GELU activation function, Max Pooling j is the j-th max pooling layer, and Avg Pooling is an average pooling layer.Finally, Epoch Feature Extraction Module outputs the epoch features X epoch = {x In pervious work on automatic sleep staging [12], [13], [14], Transformer or multi-head attention is used to model global temporal context and achieves a high performance.Inspired by these studies, we use a Transformer Encoder as a Sequence Feature Extraction Module, which can encode global context features through multi-head attention.
The Transformer layer, just like standard Transformer [42], adopts scaled dot-product attention, which is defined as Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Fig. 6.Illustration of multi-task process (here only L = 4 for visual purposes).Here, FC is fully-connected layer, and + is element-wise addition. follows: where matrices Q, K , and V consist of queries, keys, and values, respectively, and d k is the dimension of keys.We feed epoch features The process is as follows: where T rans f or mer is standard Transformer Encoder, X seq = {x

D. Narcolepsy Diagnosis With Sleep Staging Features
As Fig. 2 and Fig. 3 in Section II-B show, there are significant differences in the proportion and transition of sleep stages between patients and normals.Some existing studies [31], [33] have proved that known sleep stage information can improve the performance of narcolepsy diagnosis.Therefore, we try to take advantages of sleep staging for narcolepsy diagnosis.Here, we adopt the idea of multi-task learning, where we take sleep staging as the auxiliary task to automatically extract sleep stage features for narcolepsy diagnosis.For where y i, j ∈ R, the j-th element of y i , denotes the probability that the i-th epoch is predicted to the j-th sleep stage class, and ŷi, j ∈ {0, 1}, the j-th element of ŷi , denotes the probability that the i-th epoch actually belongs to the j-th class.
2) Primary Task: Narcolepsy Diagnosis: Our narcolepsy diagnosis process is shown in Fig. 6.Considering that the task of narcolepsy diagnosis is sequence-level, we calculate the average feature of sequence context features X seq before feeding it into MLP.The process of narcolepsy feature mapping is as follow: where x seq i is i-th feature of sequence context features X seq , M L P is multilayer perceptrons consisting of two fullyconnected layers, x nar colepsy ∈ R d ′ is narcolepsy feature.For making sleep staging task as the auxiliary task of narcolepsy diagnosis, we design a Task Feature Fusion Component to fuse sleep stage feature and narcolepsy feature together.The Task Feature Fusion Component is as follows: where is the i-th feature of sleep stage features X stage and x f usion ∈ R d ′ is the fused narcolepsy feature.Then we feed x f usion into Sequence-level Narcolepsy Classifier, which consists of fully-connected layer and a softmax function, to obtain z ∈ R M .z is the predicted probability in M Narcolepsy classes of the sleep sequence.We use the cross-Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
where z j ∈ R, the j-th element of z, denotes the probability that the sequence is predicted to the j-th narcolepsy diagnosis class, and ẑ j ∈ {0, 1}, the j-th element of ẑ, denotes the probability that the sequence actually belongs to the j-th narcolepsy diagnosis class.

E. Joint Training
In the training procedure, sleep staging and narcolepsy diagnosis are jointly trained together by the same one objective function, which consists of two different parts, staging loss and diagnosing loss, described in Equation 10: where λ is the coefficient of two loss functions for sleep staging and narcolepsy diagnosis.

IV. EXPERIMENT A. Performance Measurement and Implementation
We use ACC (accuracy) and F 1 -score (F 1 ) to measure model performance.In particular, given that the task of sleep staging is a multi-class classification problem, we replace F 1 score with Macro F 1 score.In other word, we calculated the F 1 scores in a class-wise manner, and reported the mean value to get Macro F 1 score.
Inspired by most of existing methods on automatic sleep staging, we adopted a subject-wise 6-fold cross-validation policy by dividing the subjects in the dataset into 6 groups.In each fold, five groups were used for training, and the left one for testing, ensuring that the data from the same one subject never appear in the training set and testing set simultaneously.In addition, we ensured that each fold has the same number of subjects and includes patients and normals.The details of data splitting are shown in Tab.III.
We implemented our deep learning model based on the PyTorch [43].We evaluated EEG, EOG, ECG, EMG (including chin EMG and Leg EMG) and Nasal Presure from PSG recordings on our deep learning model.The model was trained using the Adam optimizer with default settings and the learning rate was set to 1e-4.The mini-batch size was set to 32 and dropout [44] rate was set to 0.1.We adopted early stopping [45] policy in the training process.If the model does not achieve a better performance any more for ten consecutive

B. Compared Methods
In our experiment, we compared our proposed method with the following approaches on sleep staging and narcolepsy diagnosis.For fair comparison, all the approaches were evaluated on the same dataset, and adopted subject-wise training policy: SVM (Support Vector Machine) [46] uses a Gaussian kernel function for automatic sleep staging and narcolepsy diagnosis.
RF (Random Forests) [47] is an ensemble learning method.CNN (Convolutional Neural Network) is used as the feature extractor of raw PSG recordings for automatic sleep staging and narcolepsy diagnosis.
CNN + RNN, where CNN is used to extract local features within each epoch and RNN is used to extract context features from an epoch sequence.
Transformer is used as the feature extractor of PSG recordings for automatic sleep staging and narcolepsy diagnosis.

C. Overall Results
We first compared our model with other approaches for sleep staging and narcolepsy diagnosis on single-channel EEG (F4-M1).Previous studies have proved that using EEG achieves good performance [8], [36].Here, all the approaches were evaluated using EEG signals for sleep staging and narcolepsy diagnosis.As we can see from Tab. IV, our method achieves the best performance.SVM and RF perform the worst, about 16% lower in accuracy than our method (65.85% v.s.

A. Analysis of Sleep Staging Task
To investigate the effectiveness of the auxiliary task of sleep staging, the Task Feature Fusion Component and the end-toend manner, we compared our model with the three following methods: Single-Task Method: We set single-task method as a baseline method, where we ablate the Sleep Feature Mapping Module, the Task Feature Fusion Component and the Epoch-level Sleep Stage Classifier from our model.
No-Fusion Method: We set no-fusion method as another baseline method, where we only ablate the Task Feature Fusion Component.This model can be used to classify sleep stages and narcolepsy, but the sleep stage features and narcolepsy features are not fused together for narcolepsy diagnosis.
Two-Phase Method: In the two-phase method, the sleep staging is automatically scored in the first phase, and the narcolepsy is diagnosed in the second step.The two tasks are trained separately.
For fair comparison, we set the same hyperparameters for these models as our model.The results of ablation experiments are shown in Tab.V. From Tab.V we can see that single-task method performs the worst, 2.97% lower in accuracy and 2.76% lower in F 1 than our model, on narcolepsy diagnosis (75.97% v.s.78.94% in accurasy and 82.69% v.s.85.46% in F 1 ).It is reasonable that the single-task method without sleep staging task can not well extract features and learn the transition rules about sleep stages, which can help classify narcolepsy.No-fusion method performs close to our model on sleep staging (80.84% v.s.81.24% in accuracy and 75.04% v.s.74.85% in Macro-F 1 ).Obviously, ablating Task Feature Fusion Component has no significant impact on performance of sleep  .These indicate that two phase method works well in sleep staging.However, the sleep stages scored in the first phase contain incorrect labels which could not be well optimized in the second phase, causing cumulative error and leading the poor performance for the disorder diagnosis.All the results prove the importance of setting sleep staging as the auxiliary task for narcolepsy diagnosis.

B. Analysis of Highly Correlated Channels
As shown in Fig. 4, some channels are highly correlated, such as the six EEG channels and the two chin EMG channels.In EEG, F4-M1 and C4-M1 channels are highly correlated.In EMG, the correlation coefficient between two chin EMG is high.Channels that exhibit high correlation with each other can lead to information redundancy.Among them, we could choose only one channel to feed into our deep learning model to achieve a high performance.Here, we tested the model performance when using single channel and using highly correlated channels, respectively.Specifically, we evaluated our model on single channel (F4-M1, C4-M1, F3-M2, C3-M2, O2-M1, O1-M1 in EEG and Chin1-Chin2, Chin3-Chin2, LegL, LegR in EMG), two highly correlated channels (F4-M1 + C4-M1 and Chin1-Chin2 + Chin3-Chin2), all EEG channels and all EMG channels, shown in Tab.VI.
For EEG, when only using F4-M1, our model achieves the best performance on narcolepsy diagnosis (78.94% in accuracy and 85.45% in F 1 ).Compared with F4-M1, our model using C4-M1 performs a little worse on sleep staging (81.24% v.s.80.45% in accuracy and 74.85% v.s.74.31% in Macro-F 1 ) and narcolepsy diagnosis (78.94% v.s.77.10% in accuracy and 85.45% v.s.83.48% in F 1 ).In addition, our model using other single-channel EEG performs much worse than F4-M1 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

C. Analysis of Multiple Modalities
In this experiment, we investigated the predictive abilities of different modalities and their combination for narcolepsy diagnosis, including EEG, EOG, EMG, ECG, Nasal Pressure, EEG+EOG, EEG+EMG, EEG+ECG, and EEG+Nasal Pressure.In single-modailty experiments, the EEG here refers to the single-channel EEG of F4-M1 channel which achieves the best performance in narcolepsy diagnosis.[7], EOG is also an important standard for experts to assign sleep stages.In addition, EMG, ECG and Nasal pressure are not so helpful for sleep staging for which all the accuracy are lower than 60%, but they are relatively useful for narcolepsy diagnosis for which all the accuracy values are higher than 70%.
When using the combined modalities, on sleep staging, our model using EEG+EOG performs the best, about

D. Subject-Level Case Study
In order to give a better understanding of the narcolepsy diagnosis, we selected one patient from our dataset to illustrate her/his hypnogram through the whole night.Here, we present Fig. 7. Case study on one patient.In (c), "Yes" denotes that this sequence is predicted as narcolepsy."No" denotes that this sequence is not predicted as narcolepsy.
the groundtruth hypnogram and the hypnogram automatically scored by our model of the patient, respectively, shown in Fig. 7 In our model, when input a sequence of 20 epochs, it will output a diagnosis result.In this way, for each subject, there are multiple diagnosis results.Therefore, we could determine the diagnosis result in subject-level.Specifically, for each subject, we take all of her/his diagnosis results into account, and if more than 50% of the results determine the subject with narcolepsy, we determine that she/he is with narcolepsy.In this way, our model achieves a 100% in accuracy in subject-level narcolepsy diagnosis.It suggests that we could improve the robustness of our model by taking more diagnosis results in sequence-level.

E. Limitations
We must acknowledge the limitations of the dataset used in this work.First, due to the difficulty in recruiting a large number of patients, the total number of subjects in our dataset was relatively small, 77 in total.All the subjects in our dataset are from China, and the conclusions we obtained were mainly for a Chinese population.Second, types of sleep disorders in our dataset were limited to narcolepsy.There were many other sleeping disorders, such as insomnia, restless legs syndrome, and sleep apnea, that were endangering people's health.We cannot research on these sleep disorder in our dataset.Finally, there are many challenges on classifying narcolepsy into fine-grained categories, including type 1 narcolepsy, type 2 narcolepsy, and unspecified narcolepsy.The labels provided for narcolepsy were limited to nacolepsy and normal, without fine-grained categories.In the future, we will continue to study sleep disorders and try to address these challenges.

VI. CONCLUSION
In clinic, it is difficult for doctors correctly and objectively to diagnose narcolepsy.In this paper, we address the problem of diagnosing narcolepsy automatically and objectively using PSG signals.We collected a dataset of PSG recordings from 77 participants.We propose a novel end-to-end framework for narcolepsy diagnosis, which embeds the sequential relationship within each epoch and between epochs in PSG signals, automatically scores the sleep staging, and combines the sleep stage related features with narcolepsy features together for narcolepsy diagnosis.In particular, we adopt the idea of multitask learning, where we take the sleep staging as the auxiliary task, and take the narcolepsy diagnosis as the primary task.The framework was evaluated on the collected dataset, and the results show that both of the sleep stage features and the endto-end fashion help diagnose narcolepsy.Moreover, we do a comprehensive analysis on the PSG recordings, including the importance of sleep staging for the diagnosis, highly correlated channels, and the predictive ability of different modality (e.g., EEG, EOG, EMG, and ECG).

Fig. 2 .
Fig. 2. Significance test on the number of epochs in each sleep stage for normals and patients.

Fig. 3 .
Fig. 3.The hypogram of one whole-night recording from (a) one patient and (b) one normal.

∈
R d and d is the feature dimention.2) Sequence Feature Extraction Module: Transition patterns of sleep stages between epochs play an critical role in sleep staging[7].Therefore, modeling the relationship between sleep epochs in sequence is helpful for sleep staging.In addition, for narcolepsy diagnosis, extracting global context features from the sequence of sleep epoch can avoid being limited to the local characteristics of the waveform within an epoch.In other words, modeling the sleep sequence can expand the receptive field of model to learn global characteristics of the waveform, which can improve the performance of narcolepsy diagnosis.Due to effectiveness of modeling global relationship, we propose a Sequence Feature Extraction Module to extract context features between epochs in a sleep sequence.
(a) and (b).Meanwhile, we present the narcolepsy diagnosis results obtained by our model, shown in Fig. 7(c).It can be seen from Fig. 7(a) and (b), the sleep stages of most epochs of this patient are correctly scored by our model, and only a few epochs are misclassified.It is difficult to correctly score the sleep stages with rapid sleep transitions.The sequential relationship among such sleep fragments is hard to model.As we can see from Fig. 7(c), we can correctly diagnose the narcolepsy for most sequences by our model.
We first design Epoch Feature Extraction Module to extract the local features within each epoch of raw signals from PSG.Then, the epoch features are input to Sequence Feature Extraction Module.Next, we design two task-guided feature mapping modules, Sleep Stage Feature Mapping Module and Narcolepsy Feature Mapping Module.Sleep Stage Feature Mapping Module is used to map features for sleep staging and Narcolepsy Feature Mapping Module is used to map features for narcolepsy diagnosis.The sequence features are fed into Sleep Stage Feature Mapping Module and Narcolepsy Feature Mapping Module to obtain sleep stage features and narcolepsy features, respectively.Then, sleep stage features are fed into Epoch-level Sleep Stage classifier to predict sleep stages and are also fed into Task Feature Fusion Component with narcolepsy features to obtain fused narcolepsy features.Finally, fused narcolepsy features are fed into Sequence-level Narcolepsy Classifier to diagnose narcolepsy.For automatic sleep staging, extracting features from local salient waveforms within each epoch can help classify sleep staging in epoch level.In addition, existing studies on sleep disorder [7]Feature Extraction Module1) Epoch Feature Extraction Module: Local salient wave features are critical in sleep staging for sleep experts[7].
∈∈ R d ′ and d ′ is task-guided feature dimention.After mapping sequence context features X seq into sleep stage features X stage , we feed X stage into Epoch-level Sleep Stage Classifier, which consists of fully-connected layer and a softmax funtion, to obtain Y = {y 1 , y 2 , y 3 , . . ., y L }, where y i ∈ R N is the predicted probability in N sleep stage classes of the i-th epoch.We use the cross-entropy (CE) function as sleep staging loss function: epochs, the ends.The Transformer block of Sequence Feature Extraction Module has 8 heads and 512 hidden states.We set the length of sleep epoch sequence as L = 20, feature dimension as d = 512, task-guided feature dimension as d = 128 and the coefficient of two loss functions as λ = 0.5.Before being fed into deep learning model, EEG, EOG, ECG, EMG and Nasal Pressure signals were resampled to 100Hz.We trained the model on the machine with Intel Core i9 10900K CPU and eight NVIDIA RTX 3080 GPUs.
67.21% v.s.81.24% on sleep staging and 61.66% v.s.61.78% v.s.78.94% on narcolepsy diagnosis).It indicates the sequential relationship in EEG signals is important for the diagnosis, which the traditional machine learning methods cannot model yet.CNN performs worse than our method on sleep staging (79.32% v.s.81.24% in accuracy and 72.09% v.s.74.85% in Macro-F 1 ) and narcolepsy diagnosis (72.82% v.s.78.94% in accuracy and 80.72% v.s.85.45% in F 1 ), indicating fully CNN without context features extractor cannot well model sequential relationship between epochs, which helps for the disorder diagnosis.CNN+RNN, where CNN is used as Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE V ANALYSIS
OF AUXILIARY TASK IN SINGLE-CHANNEL EEG (F4-M1) epoch feature extractor and RNN is used as sequence feature extractor, performs worse than our method on sleep staging (79.91% v.s.81.24% in accuracy and 71.87% v.s.74.95% in Macro-F 1 ) and on narcolepsy diagnosis (74.81% v.s.78.94% in accuracy and 81.81% v.s.85.45% in F 1 ).Transformer, using fully Transformer to capture local and global features from EEG signals, performs worse about 1.7% accuracy and 1.9% F 1 than our method on narcolepsy diagnosis (76.34% v.s.78.94% in accuracy and 83.59% v.s.85.45% in F 1 ).The Transformer cannot well extract local features within each epoch from EEG signals.Compared with other approaches, our deep learning model, using CNN as epoch feature extractor and Transformer as sequence feature extractor, utilizing sleep staging as the auxiliary task, can well model local and global features from EEG signals and make full use of sleep stage information to improve the performance of narcolepsy diagnosis.

TABLE VI THE
RESULTS OF USING SINGLE CHANNEL AND MULTIPLE CHANNELS staging.However, on narcolepsy diagnosis, no-fusion method performs 1.84% lower in accuracy and 0.9% lower in F 1 than our model (77.10% v.s.78.94% in accuracy and 84.55% v.s.85.45% in F 1 ).It further indicates that the sleep stage features can improve the performance of narcolepsy diagnosis.Two phase method performs close to our model on sleep staging (80.69% v.s.81.24% in accuracy and 74.58% v.s.74.85% in Macro-F 1 ).However, for narcolepsy diagnosis, it performs 1.83% lower in accuracy and 2.89% lower in F 1 than our model (77.11% v.s.78.94% in accuracy and 82.56% v.s.85.45% in F 1

TABLE VII THE
RESULTS OF USING SINGLE MODALITIES AND MULTI MODALITIES 15% v.s.80.40% v.s.79.97% in F 1 ).In Tab. 4, the values of Pearson correlation coefficient between Chin EMG and Leg EMG are low.It indicates that Leg EMG contains different information from Chin EMG, which performs worse on sleep staging but performs better on narcolepsy diagnosis than Chin EMG.It is worth noting that our model in all EMG channels performs better than Chin3-Chin2 on narcolepsy diagnosis (74.18% v.s.70.88% in accuracy and 80.57% v.s.79.97% in F 1 ).It further indicates that Leg EMG can help provide effective features for narcolepsy diagnosis.
Table VII shows the performance comparison.As we can see from Tab. VII, when using single modality of EEG, our method achieves the best performance (81.24% in accuracy and 74.85% in Macro-F 1 ) on sleep staging and the best performance (78.94% in accurasy and 85.45% in F 1 ) on narcolepsy diagnosis compared with other single-modality results, indicating that EEG is the most predictive for sleep staging and narcolepsy diagnosis in PSG recordings.Using EOG also has a good performance.The sleep staging results are close to EEG (81.11% v.s.81.24% in accuracy, 74.28% v.s.74.85% in Macro-F 1 ), but the narcolepsy diagnosis results are lower than EEG (76.63% v.s.78.94% in accuracy and 84.06% v.s.85.46% in F 1 ).According to AASM sleep standard