Association Between Sleep Quality and Deep Learning-Based Sleep Onset Latency Distribution Using an Electroencephalogram

To evaluate sleep quality, it is necessary to monitor overnight sleep duration. However, sleep monitoring typically requires more than 7 hours, which can be inefficient in termxs of data size and analysis. Therefore, we proposed to develop a deep learning-based model using a 30 sec sleep electroencephalogram (EEG) early in the sleep cycle to predict sleep onset latency (SOL) distribution and explore associations with sleep quality (SQ). We propose a deep learning model composed of a structure that decomposes and restores the signal in epoch units and a structure that predicts the SOL distribution. We used the Sleep Heart Health Study public dataset, which includes a large number of study subjects, to estimate and evaluate the proposed model. The proposed model estimated the SOL distribution and divided it into four clusters. The advantage of the proposed model is that it shows the process of falling asleep for individual participants as a probability graph over time. Furthermore, we compared the baseline of good SQ and SOL and showed that less than 10 minutes SOL correlated better with good SQ. Moreover, it was the most suitable sleep feature that could be predicted using early EEG, compared with the total sleep time, sleep efficiency, and actual sleep time. Our study showed the feasibility of estimating SOL distribution using deep learning with an early EEG and showed that SOL distribution within 10 minutes was associated with good SQ.


Association Between Sleep Quality and Deep
Learning-Based Sleep Onset Latency Distribution Using an Electroencephalogram Seungwon Oh , Young-Seok Kweon, Gi-Hwan Shin , and Seong-Whan Lee , Fellow, IEEE

I. INTRODUCTION
S LEEP is vital in daily life and is related to memory, the immune system, and metabolism [1].In particular, poor sleep quality (SQ) can lead to sleep disorders, such as insomnia and sleep apnea, and potentially cause obesity, diabetes, cardiovascular disease, and depression.Millions of people worldwide experience sleep disorders, and the need for research on good quality sleep is increasing [2], [3], [4].Therefore, an accurate assessment of SQ is necessary to improve one's well-being and productivity.
Two main methods are used for assessing SQ: questionnaires and sensors.Questionnaire assessment is a self-reporting method wherein the responses are scored and the SQ classified.The questionnaire assessment does not require medical knowledge, However, the Pittsburgh SQ Index (PSQI) or the questionnaire assessment has several disadvantages.First, there may be missing or inaccurate responses.Participants may not understand the questionnaire items properly or provide convenient responses owing to a lack of memory of the sleep process.Second, it is difficult to obtain detailed information about sleep.Participants may not be able to understand the technical characteristics of various sleep stages, such as the percentages of light and deep sleep, and sleep onset latency.Lastly, because questionnaires have the problem of being post-measurable, they are suitable as a variable for evaluating SQ but are unsuitable for predicting SQ.
Sensor-based assessment involves the observation of various signals during sleep using contact and non-contact methods.The most commonly used sensor for sleep assessment is polysomnography, which consists of, electrooculography, electromyography, electrocardiography, and electroencephalography (EEG) [5], [6].EEG can measure brain activity using an electric field; therefore, it has recently been used in research on sleep monitoring and sleep classification [7], [8], [9], [10].Quantitative assessment using a sensor has the advantage of being a more objective assessment because it involves overnight observations.However, it is difficult to analyze large datasets that include an average of 7 h [11], [12].
To evaluate the quality of sleep, various sleep components should be considered.For example, the percentage of deep sleep or total sleep time, time in bed after sleep offset, and sleep onset latency (SOL).Most sleep components are difficult to observe early because of the observation of the entire sleep period.However, SOL can be observed early in the sleep period, so it can provide an early diagnosis of sleep quality.SOL refers to the delayed time until sleep after turning off lights; studies have shown that prolonged SOL, wherein people have difficulty falling asleep, can have a negative impact on SQ, duration, and overall health [13].Therefore, if the cause of long-term SOL can be identified in advance, overall SQ can be improved [14], [15].Consequently, an accurate prediction of SOL can predict SQ.
Distribution-based models have several advantages over point estimation.First, it is robust to outliers because it considers a distribution with probability rather than a point.Second, distribution-based models have good interpretability because they can infer the shape of an entire dataset using fewer parameters.However, distribution-based models may not reflect data characteristics if the prior distribution assumption is incorrect.Therefore, it is necessary to assume a distribution that fits the data.
The purpose of this study was to estimate the distribution of SOL using an early 30 s EEG and to assess its association with good SQ.SQ can be estimated by overnight monitoring, which is time-consuming and difficult to predict directly.Therefore, we used early EEG after the lights were turned off to estimate the distribution of SOL and clustered it.We then compared the relationships between the classified SOL clusters and sleep quality to identify a standard SOL for good sleep.To do this, an end-to-end deep-learning-based model that predicts the SOL and clustering is used.The process of extracting features from EEG was adapted by considering the structure used in previous sleep stage classification models.We constructed the model using CNN, which is used to extract spatial information from the EEG, and transformer for parametric features and reconstruction.
The difference between this study and previous studies is that we propose a novel approach to assess sleep quality.Most previous studies to evaluate sleep quality have been to classify five sleep stages, which require long-term sleep observation of about 7 to 10 hours.However, in this study, we estimate the distribution of SOLs through a short observation of 30 seconds after sleep onset and can predict sleep quality early.
To the best of our knowledge, this is the first study to estimate the SOL distribution using deep learning with EEG and to determine its relevance to SQ.The main contributions of this study are as follows: 1) To develop a deep learning-based model to estimate the distribution of SOL using short-term EEG after turning off lights.2) To examine the association between SQ and SOL and propose an SOL standard for good SQ.

A. Sleep Quality
The PSQI is a commonly used questionnaire that assesses several aspects of sleep and subjectively assesses SQ [16].The PSQI consist of 19-item questionnaire scored on a scale of 0 to 3. A total score of < 6 indicates good SQ.The questionnaire assessment evaluates SQ through the personalized assessment of a participant's sleep.Conventional methods include the PSQI, mini sleep questionnaire [17], and insomnia severity index [18].A scoring method that utilizes sensor-based data related to sleep is proposed [19].These include six components: sleep onset latency (SOL), total sleep time, sleep efficiency, sleep disturbance, percentage of deep sleep, and percentage of rapid eye movement (REM).These sleep parameters have been widely used in previous studies [20], [21].The ranges, definitions, and scores of each sleep parameter are shown in Table I.The SQ score uses the sum of six sleep parameters proposed in a previous study.

B. Time Distribution
A time distribution is a distribution that implies a change in a variable over time.Therefore, a time distribution is defined on the space of positive real numbers.Survival analysis is a representative study using time distributions.A deep survival machine [22] is a deep learning-based fully parametric survival model using Weibull and log-Gaussian distributions.In [23], Markov chain Monte Carlo was used to estimate parameters of the log-logistic distribution comparing Bayesian estimates to maximum likelihood estimates using real cancer data.Furthermore, the study was conducted to select the appropriate distribution between the log-Gaussian and log-Logistic distributions from lifetime data [24].A detailed description of the distributions is shown in the supplementary.

C. Sleep Stage Classification Model
A machine learning-based automatic sleep classification model was used to analyze complex and difficult-to-interpret EEG [25].These models performed automatic sleep stage classification in the time domain [26], [27], frequency domain [28], [29], and time-frequency domain [30], [31].However, feature extraction and classification in each domain require prior domain knowledge.
Deep learning-based models performed faster and outperformed conventional sleep classification models without any prior domain knowledge.Deep learning-based sleep classification models mainly use convolution neural networks (CNN), recurrent neural networks (RNN), and transformers.First, the CNN-based model performed well in sleep classification in that the models were able to extract neighboring spatial features using fewer parameters than a multi-layer perceptron (MLP) [32], [33].Second, the RNN-based model which summarizes the past information compactly using memory state have good performance as well [7], [34].Lastly, transformer [35], used in natural language processing, have been used in a variety of fields because they can include long-range dependencies within data, and have recently been used in sleep classification models [8], [36].
One of the advantages of deep learning-based models is that researchers can use a combination of different models to improve performance.In previous studies, there are models which improved performance using different deep learningbased models.First, a model that uses multi-layer perceptron to extract features and combine temporal information of the features has been proposed [37].Moreover, some studies used CNNs to extract local signal features and RNNs to combine temporal dependencies [32], [38], [39], [40], [41].Lastly, models using both CNNs and transformers, or using multiple transformer models, have been proposed to improve classification performance [8], [36].

A. Model
The proposed model comprises three main structures (Fig. 1).First, the input signal is divided into epochs; then, features are extracted through a convolutional neural network and MLP; and finally, the epoch-to-epoch features are combined using a transformer.Second, the transformer output is divided into the processes of restoring and estimating the distribution.During the restoration process, an autoencoder is performed that learns to produce the same result as the input signal.And the distribution estimation process estimates two parameters of π i and prior distribution, which means the probability of following that distribution.We also use a clustering process that created a difference between the ith and the jth distributions to separate the clusters.
1) Transformer Module: A transformer-based encoder was used to extract the features from the input signal (X ∈ R 1,s ).First, the input signal is split into sub-signals (X sub ∈ R s/ p, p ) with patch size ( p) [35].And then sub-signals are converted using depth-wise convolution and MLP.
Next, the X sub , converted sub-signals, with a constant term after the addition of position encoding were fed into the transformer encoder.In this study, trainable 1D positional encoding with an initial value between zero and one was used.For simplicity, a transformer encoder that has been widely used in the original session is used [42].Consequently, Z ∈ R s/ p+1,D are fed into the transformer.
The transformer key architecture is multi-head attention , and self-attention (AT T (•)).Z calculates each matrix to transform the keys, queries, and values.And then, ith self-attention (AT T i (•)) is performed on these three components.Finally, self-attention expands over H heads for each key, query, and values ( Z = {Z (1) , Z (2) , . . .,Z (h) }) and expanded self-attention is concatenated to M H A (•). Finally, the normalization Z ∈ R s/ p+1,D is the output of the transformer, is converted into a linear layer with layer normalization (LN) and RELU.And then, 2) Reconstruction Process: In this process, R 1 is used to restore the input signal by using MLP and depth-wise convolution in the opposite direction.This reconstruction process is similar to the information reduction and restoration of the autoencoder so that the extracted features can be guided to return to the original signal.Therefore, R 1 can include important information from the features of the input signal and exclude unimportant features.
3) Parametric Process: The R 2 , parametric features, is used as a parameter to estimate the distribution by passing it through three MLPs.M L P 1 and M L P 2 consist of a linear layer with batch normalization and RELU.M L P 3 is the same as both M L P 1 and, M L P 2 but with batch normalization and a SoftMax layer added at the end.The output of M L P 3 is the probability of being assigned to a cluster.
The outputs of M L P 1 , M L P 2 , the two parameters of the distribution of size k, and the output of M L P 3 are the probabilities of belonging to distribution.The assumed parameters of the distribution are defined in positive real space, except for the center parameter of the log-Gaussian distribution.To convert the estimated parameters to positive numbers, we applied P O S (•) (7), which consists of an exponential and a logarithmic function.
Because learning is performed in batches, each parameter is estimated through samples of size B. Because different sizes of values can be estimated for each batch, it is necessary to set a baseline value for each batch.As a result, we used β prior ∈ R 1 and γ prior ∈ R 1 as shared parameters to make them the reference parameters of the distribution.
Here, W β , W γ ∈ R k and F k (x) is the cumulative distribution for prior distribution.

B. Loss
In this section, three losses are defined.The first loss evaluates the reconstruction signal using R 1 and input signal.The second loss evaluates the k SOL distribution ( f k t; β k , γ k ), which is composed of output parameters.The third loss evaluates π k , which is the probability of belonging to k distributions.Here is f k t; β k , γ k when the SOL distribution is assumed to be log-Gaussian and other distribution is shown in supplementary: 1) Reconstruction Loss: The reconstruction loss (L r econ ) evaluates the similarity between the reconstructed X r econ and the input X.Therefore, we use the mean squared error to evaluate the difference between the two signals.
2) Distribution Loss: Maximum likelihood estimation (MLE) is used to estimate the estimation method.The MLE determines the parameter that maximizes the likelihood of the data.The likelihood function is used to calculate the MLE of the parameters, which is the set of parameter values that maximizes the likelihood of the observed data sampling size with B. Because the assumed distributions is an exponential family, the log-likelihood can change the multiplication of probabilities to summation.Consequently, we use the loss function (L 3) Cluster Loss: Cluster loss is learned to increase the distance between the clusters that are closest in distance by comparing them using the pairwise method.For this purpose, we propose a modified F i, j score, which is a modification of the T-score (T i, j ) used in statistical testing.The T-distribution with d f for degrees of freedom has the same meaning as √ F 1,d f , and 1/T, meaning that the mean difference between the two groups is large.The F i, j follows an F-distribution and has degrees of freedom (n i + n j − 2, 1).Therefore, the F i, j is the pairwise statistical distance of the cluster, and the smaller the two distance indicators, the farther the center distance, which indicates that the clustering performed well.Consequently, we improved the classification performance by including the loss of the two groups with the closest center distance in the model through L clus .
Here, i, j ∈ {1, 2, • • • , k}, n i is sample size of i th cluster, ci is the center mean of i th cluster, and s i is sample standard deviation of i th cluster.
Consequently, the total loss (L) is the weighted sum of L r econ , L distri , and L clus with w r , w d and w c .

C. Falling Asleep Process
In survival analysis, survival distribution can be used to estimate the survival probability over time.Therefore, we can estimate the falling asleep process (FAP), which represents the time taken to fall asleep, using the weighted sum of SOL distribution.f k (t) is the kth probability density function and F k (t) is the kth cumulative distribution function.Because the FAP represents the probability of falling asleep over time, it is a decreasing function defined in positive real space and can be expressed as follows: For the point estimation of the SOL (c k ), the median residual time [43], which is the time at which the probability of the subject being awake is halved using kth distribution, is used (20).We used the weighted sum of c k of the estimated distribution for accurate SOL prediction ( S O L) but applied hard clustering so that it could belong to only one cluster for cluster evaluation.
The purpose of this study is to divide SOL prediction and SQ into clusters using 30 s of EEG.Therefore, we evaluate whether R 1 is well restored (L r econ ), whether R 2 accurately estimates the distribution of SOL (L distri ), and whether clusters of the separate distributions were well classified (L clus ).

D. Evaluation Metrics
We used three evaluation metrics: mean absolute error (MAE), concordance index (C-index), and multiple F-score (MFS).First, MAE evaluates the accuracy of the prediction.The MAE is the error between the true SOL and predicted SOL, therefore the lower the MAE, the better the performance.
Second, the concordance index (C-index) [44], which compares the order of the predicted values with that of the actual values.The C-index is commonly used in survival analyses to evaluate correct lifetime order.C-index of 1.0 means perfect predictions for correct order, 0.5 means random predictions, and 0.0 means perfect reverse prediction for order.
Last, the multiple F-score (MFS) was used to compare the performances of the classified clusters.MFS is a measure of the degree of difference between the means of clusters and was calculated using the F-statistic of the Analysis of Variance (ANOVA) test.In the ANOVA test, a larger F-statistic indicates a difference between the means of the groups, and the baseline is the significance level of the F-distribution.In this study, we used the inverse F-distribution, which still follows the F-distribution, to assign a score for group heterogeneity; the closer it was to zero, the better.As a result, the MFS is small if the group heterogeneity is large and the group heterogeneity is small, the smaller the value, the better the performance.
where k is the number of clusters, N is the total number of samples, X i, j is i th sample from j th cluster, X j is the mean of j th cluster, and X is the total mean.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

E. Experimental Setup
First, we randomly divided all the participants into 10 folds, and used one fold as the test set to compare the performance of the model, and the remaining nine folds were again used as 80% training set and 20% validation set.In other words, the model was trained on 80% the data in the nine folds, and early stopping was performed when the evaluation metrics were not updated for 50 epochs in the validation set.The final evaluation of the models was based on their performance on the test set.
There were eight hyper-parameters for the proposed model.Our training environment was PyTorch 1.7.1, which was trained using a 2080ti GPU.For training, the batch size (B) was 64 and the learning rate was 1e-4.We experimented with each hyperparameter and selected the parameters with the best performance, as shown in Table II.More details on the hyper-parameters and the experimental setup are shown in Supplementary.

F. Data
As deep learning requires a large number of participants, the Sleep Heart Health Study (SHHS) was used.The SHHS dataset is a multicenter cohort clinical trial that investigated the cardiovascular and other outcomes of sleep-disordered breathing [45], [46].Data were collected as part of this study (ClinicalTrials.govnumber: NCT00005275).The participants had various conditions, including lung, cardiovascular, and coronary artery diseases.We used the night sleep study EEG data and morning survey data obtained from the SHHS1 database, which included 5,793 participants.The EEG data included C3-A2 and C4-A1 EEG channels, which were sampled at 125 Hz.In the present study, we selected data from the C4-A1 channel.
If the observed SOL value was 0 min, it was changed to a minimum value of 0.5 min.The SQ score ranges from 0 to 14; the smaller the score, the better the quality of sleep.In a previous study, a total SQ score of < 6 indicated good SQ [16].Statistics for the SOL and SQ scores are presented in Table III.More details on the variables related to the six components of the SHHS are provided in the supplementary material.

Determining the number of clusters is important in cluster
We considered prior research, the characteristics of the SOLs, and performance comparisons based on the number of mixture distributions.As a result, we determined that four clusters are appropriate.The details are shown in the Finding the number of clusters section in the supplementary.
We compared the accuracies of SQ and SOL clustering with those of conventional clustering algorithms for timeseries data.However, since conventional clustering algorithms do not include a process for predicting SOL, it was difficult to make a direct comparison.We used conventional k-means clustering algorithms for time series [47], k-shape [48], and Unsupervised Discriminative Feature Selection (UDFS) [49].Moreover, DeepSurv [50] and DeepHit [51], survival models for time prediction, were used for accurate prediction comparison of SOL.We also compared the process of estimating the SOL of the proposed model using a Gaussian distribution.The model performance was evaluated by comparing the MFS of the SOL and SQ scores and determining whether it matched the initial number of clusters and divergence.

A. Comparison With Baseline Method
The advantage of predicting the SOL distribution is the ability to visualize the estimated time distribution using the FAP.Therefore, we can estimate the FAP, which represents the time taken to fall asleep, using the SOL distribution.Because the FAP represents the probability of falling asleep over time, it is a decreasing function over time.First, we calculated the FAP k for each of the four log-Gaussian distributions.Next, we obtained the final FAP by the weighted summation of the probability of belonging to FAP k .Fig. 2 shows predicted FAP of individual participants in each four groups.
The results of these experiments were presented in Table IV.The model using a log-Gaussian distribution showed the best performance in predicting SOL (9.8 min, 0.722).However, the model using log-logistic showed the best performance in the MFS of SOL, with 0.071.The clustering performance for the SQ score was 0.021 for both the log-Gaussian and loglogistic models, which showed the best performances.On the  other hand, the model using Gaussian distribution, which is widely used in natural science, showed the third-best performance in predicting SOL (11.8 mins, 0.689).Conventional clustering models diverged or were under-clustered on some folds, whereas the proposed model showed good performance for all metrics.In survival models that predict the time to SOL, DeepSurv showed good performance with MAE of 9.8, and DeepHit showed good performance with C-index of 0.702.Overall, the model using a log-Gaussian distribution with the prediction of SOL and good clustering performance was selected as the best.
The final model, which estimated the SOL distribution using the log-Gaussian distribution and classified the groups using the four distributions, showed the best performance.Furthermore, to compare the differences between clusters, we sorted the clusters in ascending order by SOL and then compared them with the centers of the clusters corresponding to the SOL and SQ scores for each fold.We performed ANOVA analysis of four clusters in all test fold (Table V).We found statistically significant differences across all clusters in SOL and SQ ( p <0.001, p <0.001).Furthermore, we used the Tukey HSD post-hoc test to compare groups [52].All four clusters of SOL were classified into different groups, wherein SQ showed no statistically significant differences in (1) and ( 2) but showed differences in all other groups.As a result, there was no statistically significant difference between groups (1) and ( 2), and they were good SQ groups because their score was < 6.The more detail of result table of cluster center for all folds is shown in Supplementary.

B. Association Between SOL and Sleep Quality
The cluster results for each fold of the test set were compared with the overall test results (Fig. 3).Individuals with good quality sleep had an SQ score of < 6.A 95% confidence interval (CI) plot for each group of the four clusters classified in this study is shown in Fig. 3.In Fig. 3 (b), the red line represents the group with good sleep and poor sleep, which is divided by the five points as proposed in a previous study.Among the groups, (1) was a good sleep group, and Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.(4) was the poor sleep group.By contrast, clusters (2) and (3) in Fig. 3 (a) are clusters divided by SOL, and a similar classification is possible if the baseline is divided between 5 and 15 min.In Fig. 3 (c), the CI plot for the overall test set for each fold can be classified as a cluster that matches the good-quality score if the SOL is classified within a range smaller than 15 min.Finally, we determined the appropriate SOL standard based on the SQ score.
Fig. 4 compares the existing SOL score with the score proposed in this study.To compare SOL and SQ, we first divided SOL into upper score (SOL ≥ threshold) and lower score (SOL < threshold).And then compared the classification accuracy and f1 score with good SQ (SQ score < 6) and bad SQ (SQ score ≥ 6).The accuracy was highest at 15 min, and the F1 score was highest at 5 min.However, the overall average score was the highest at 10 min, therefore, it can be shown that the SOL is more related to good quality when the score is compared to the standard of 10 min.To compare questionnaire-based assessments and SQ, we performed multiple correspondence analysis (MCA) [53].The MCA is dimension reduction technique that aims to represent the relationships between multiple categorical variables in a low-dimensional space.The basic principles of MCA are to represent each category of each variable as a point in a multidimensional space, and then to project these points onto a lower-dimensional subspace in a way that preserves as much of the original information as possible.The output of MCA is a set of coordinates for each category in the low-dimensional space, as well as a set of eigenvalues and eigenvectors that represent the amount of variance explained by each dimension.
Fig. 5 shows the results of the MCA, wherein the closer the distance between two categories, the higher the similarity.The three variables used to compare the qualitative assessments were obtained from the "Morning Survey" of the SHHS data, which included the following responses: "Difficulty falling asleep," "Quality of sleep compared to usual," and "Quality of sleep light/deep."The MCA results showed that good SQ mostly included responses such as "Normal," "Usual," and "Not difficult," whereas poor SQ included "Light," "Poor," and "Difficult."In particular, the distance between "not difficult to sleep" and good SQ was the shortest; therefore, SOL was a suitable latent variable for good SQ.

C. Ablation Study
We conducted experiments to investigate the necessity of the reconstruction structure and to find the optimal loss weight parameters.As a result, we found that the best performance was achieved when the reconstruction module was included, and the loss weight parameters w c , w p , and w r were set to 1.0, 0.1, and 0.4, respectively (Table VI).We experimented to find the loss weight parameter best combination.More detailed results of the parameter tuning are shown in the supplementary materials.
Furthermore, we experimented to predict and cluster other latent variables that affect SQ using the first 30 s of an EEG.The appropriate latent feature was determined by comparing the performance of the target prediction, accuracy of the cluster, and number of under clusters.The results of predicting the total sleep time (TST), wake after sleep onset (WASO), and sleep disturbance (SD) using the early EEG showed poor target prediction performance, as the C-index was close to   VII).Except SOL, the number of clusters was classified as less than the initial set of four clusters, and the accuracy of the SOL cluster was the best, with an MFS SQ score of 0.077 and an MFS SOL of 0.021.In conclusion, an EEG obtained early the sleep cycle can be used to predict and cluster SOL, and the classified clusters can help predict SQ scores.However, for TST, WASO, and SD, which require monitoring the overall sleep process, the performance of the target prediction is not good.

V. DISCUSSION
Currently, overnight PSG is the gold standard for measuring and assessing SQ using various sleep parameters [54].PSGbased sleep parameter measurement has limitations in that it is not performed in the study subject's home environment but instead in a sleep laboratory [14].While it is difficult to compare the average effect because the measurement of sleep is performed in one night, it is still more reliable than actigraphy, sleep wearables, and questionnaire assessment, which have been presented as alternatives [55], [56].In this study, we demonstrated the feasibility of SOL prediction using an EEG obtained early in the sleep cycle and proposed a model to classify SQ.
In this study, SOL was categorized into four clusters, and compared SQ for each cluster.The results showed that the four groups in the test set were statistically different according to the ANOVA test.The Tukey HSD test showed that the SOLs were all statistically significant, but the SQ were not statistically different in (1) and ( 2).After all, (1) and ( 2 The proposed model expanded the results of classifying and predicting SOL as latent features to classification for SQ.The clustering results and MCA results showed a relationship between SOL and SQ, furthermore, it showed better performance than the SOL prediction model using the survival model.It also showed that other sleep components in the ablation study were not suitable for prediction to early EEG because they require overnight sleep duration.
The autoencoder is a neural network that reconstructs input into output, performing compression and restoration to extract features from the input data.During this process, the autoencoder retains the important features of the input data while removing unnecessary information.In previous work, transformer-based autoencoder models have been proposed and have shown good performance [57], [58].In our experiments, we also found that using the reconstruction process improves the performance of the model.
The cluster loss from the center of a cluster can be used to compare the dissimilarity of clusters.In most cases, the distance used is the Euclidean distance, which has the advantage of being simple and fast to compute, but it is insensitive to outliers and does not account for the variance of the data [59].In this study, we estimated the distribution of SOLs through a model, so we used a statistical loss that considers the variance [60], [61].As a result, the performance of the model improved when cluster loss was included.
Estimation of the SOL distribution provides comprehensive and robust predictions because it provides inferences about a population using a small number of parameters.To select an appropriate distribution, we compared three time-related distributions and the Gaussian distribution and found that the log-Gaussian distribution performed the best.In addition, we classified participants by using the π of the four log-Gaussian distributions as the cluster assignment ratio.As a result, the proposed method was able to perform the prediction and classification of SOL simultaneously.
The significance of our study is that it is possible to predict SOL from early EEG (30 sec) and to detect good SQ early through the relationship between SOL and sleep quality.Since SOL is particularly associated with insomnia, and clinically insomnia can cause low quality of life, anxiety, depression, and stress, it can contribute to improving sleep quality by predicting the process of falling asleep in insomniacs in advance [62].In addition, early detection of SQ, which requires prolonged observation in the classified group, can lead to improvement of sleep quality [63].
Using a PSG that can quantitatively measure sleep would allow for a more accurate prediction of sleep state and SOL.As the current study utilized only single-channel EEG, it is necessary to use multi-channel or other PSGs such as EOG and EMG to improve the model performance.Moreover, it is true that SOL is an important variable for good SQ, but it is required to consider the relationship between various features to predict the accurate SQ.In this study, we showed the association between good SQ and SOL, but it is difficult to classify very poor SQ (SQ score > 9) with SOL alone.
Therefore, in further studies, it is necessary to develop a model using multivariate sleep components.
Our future work is to propose a real-time sleep quality prediction framework that can determine the quality of sleep at an early stage and alleviate the suffering of people with sleep disorders.

VI. CONCLUSION
We proposed a deep-learning-based model for estimating SOL distribution using early 30 sec EEG and found a relationship between SOL and SQ.SOL was fitted to four log-Gaussian distributions and showed the best predictive performance; the four clusters were well classified, as they showed statistically significant differences.The divided clusters were further classified into two groups: good and poor SQ.In addition, our study compared the SQ and SOL scores and found that SOL within 10 min was highly associated with good SQ.

Fig. 1 .
Fig. 1.Illustration of the proposed model structure.The proposed model consists of three processes: first, the input electroencephalogram (EEG) signal is separated into epochs, and the representation features are extracted using the transformer model.Second, one of the representation features is used to reconstruct the EEG signal, and the others are used to estimate the distribution of latent variables for classifying sleep quality.Finally, the classification performance for sleep quality is improved by considering cluster loss.

Fig. 2 .
Fig. 2. Predicting the falling asleep process (FAP) for A, B, and D individual participants from different clusters.The FAP is a visual representation of the process of a participant falling asleep over time.The solid line is the final FAP estimated by the weighted sum of the four distributions.A red dashed line means a line with a probability of wake of 0.5, and prediction of sleep onset latency ( SOL) uses the median residual time.

Fig. 4 .
Fig. 4. Comparison of accuracy and f1 score for binary classification of bad sleep onset latency (≥ threshold) and bad sleep quality (>6).The black dashed line indicates the threshold when the average value of acc and f1 score is the largest.

Fig. 5 .
Fig. 5. Multi-correspondence analysis for the component of sleep quality.The x-axis and y-axis mean the first and second principal components respectively.The red circle indicates poor sleep quality, and the blue circle indicates good sleep quality.

TABLE I COMPONENTS
OF THE SCORES.A LOW SCORE INDICATES GOOD SLEEP QUALITY.THE SQ SCORE IS THE SUM OF ALL SIX COMPONENT SCORES

TABLE III STATISTICS
OF SOL AND SLEEP QUALITY SCORES IN THE SHHS DATASETS

TABLE IV COMPARISON
OF MODEL PERFORMANCE FOR THE FOUR CLUSTERS IN 10 FOLDS.THE BEST VALUES FOR EACH CLUSTER ARE HIGHLIGHTED IN BOLD AND THE SECOND-BEST VALUES ARE UNDERLINED

TABLE V THE
DESCRIPTIVE STATISTICS FOR THE CENTER OF FOUR CLUSTERS IN ALL TEST FOLDS

TABLE VI THE
ABLATION STUDY OF LOSS WEIGHT PARAMETER (w c , w d , w r ) WITH 4 CLUSTER USING LOG-GAUSSIAN DISTRIBUTION.THE BEST VALUES ON EACH CASE ARE HIGHLIGHTED IN BOLD

TABLE VII RESULTS
OF DIFFERENT LATENT VARIABLES FOR SLEEP QUALITY.THE BEST VALUES OF EACH CLUSTER ARE HIGHLIGHTED IN (Table ) mean good SOL within 5 minutes and good sleep quality, (3) means moderately good SOL with 13.2 minutes and bad sleep quality, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.and (4) means poor SOL with 22.5 minutes and bad sleep quality.