A Connectivity-Aware Graph Neural Network for Real-Time Drowsiness Classification

Drowsy driving is one of the primary causes of driving fatalities. Electroencephalography (EEG), a method for detecting drowsiness directly from brain activity, has been widely used for detecting driver drowsiness in real-time. Recent studies have revealed the great potential of using brain connectivity graphs constructed based on EEG data for drowsy state predictions. However, traditional brain connectivity networks are irrelevant to the downstream prediction tasks. This article proposes a connectivity-aware graph neural network (CAGNN) using a self-attention mechanism that can generate task-relevant connectivity networks via end-to-end training. Our method achieved an accuracy of 72.6% and outperformed other convolutional neural networks (CNNs) and graph generation methods based on a drowsy driving dataset. In addition, we introduced a squeeze-and-excitation (SE) block to capture important features and demonstrated that the SE attention score can reveal the most important feature band. We compared our generated connectivity graphs in the drowsy and alert states and found drowsiness connectivity patterns, including significantly reduced occipital connectivity and interregional connectivity. Additionally, we performed a post hoc interpretability analysis and found that our method could identify drowsiness features such as alpha spindles. Our code is available online at https://github.com/ALEX95GOGO/CAGNN.


I. INTRODUCTION
D ROWSY driving is one of the leading causes of traffic accidents [1].As drowsiness is the first sign of falling asleep, it is believed that early drowsiness detection could be useful for preventing related traffic accidents.Camerabased methods are widely used to monitor driver drowsiness.However, these methods may lead to privacy concerns and are susceptible to facial occlusion and illumination variations [2].Electroencephalography (EEG) does not have these issues and can be used to directly measure brain activity.Recently, EEG has been integrated into driver monitoring systems for drowsiness detection.For example, researchers at Mercedes-Benz investigated driver fatigue detection with alpha spindles based on the "Attention assist" system [3].In drowsiness studies, EEG has been used to indicate the transition between the waking and sleeping states [4].Compared with other frequency bands, the alpha (8-14 Hz) and theta (4-8 Hz) bands have repeatedly been shown to be more closely associated with drowsiness [5], [6], [7], [8].
For instance, alpha spindles [6] and theta bursts [9], [10] are two prominent features that occur before falling asleep.In addition, increased theta and alpha activity and decreased beta activity have been observed in cases of mental fatigue [7].In particular, power oscillations in the theta and alpha bands are highly associated with monotonous driving tasks [8].
The drowsy and alert states have been shown to be distinguishable via brain connectivity analysis.Brain functional connectivity is defined as the temporal coincidence in neural activities between segregated regions of the brain [11].Traditionally, functional connectivity graphs based on EEG were constructed via statistical coupling between two different time series using correlation metrics such as the phase lag index (PLI) [12], amplitude locking value (ALV) [13], and weighted phase lag index (WPLI) [14], and the drowsy and alert states were classified using traditional machine learning methods according to the connectivity information.However, traditional machine learning methods usually require handcrafted hyperparameters.This problem can be mitigated through deep learning methods, which use nonlinear universal approximation for feature extraction purposes.
Convolutional neural networks (CNNs) are popular deep learning models in EEG analysis, including the commonly used EEGNet [15] and ShallowConvNet [16].Driver drowsi-ness levels have been successfully detected using CNN in several studies [17], [18], [19].In some driver drowsiness studies, the reaction time could be forecasted with a CNN [19], [20].Cui et al. developed a compact single-channel CNN [17] and a multichannel interpretable CNN [21] for drowsiness detection, and these methods demonstrated better performance than EEGNet and ShallowConvNet; however, their method was designed for offline analysis only and real-time implementation using their method decreased the accuracy by approximately 6% [17].
In recent years, researchers have explored graph neural networks (GNNs) for EEG classification [22], [23], [24], [25], [26] to address the limitations of CNNs, since the natural geometries and brain networks in EEG studies are non-Euclidean.The graph in a GNN developed for EEG analysis is usually predefined based on some prior knowledge, such as the distance between electrodes, and the correlation or covariance between EEG channels [22], [24], [25], [26].However, graphs defined in this manner are not trainable or task-relevant.Moreover, the input feature vectors are fed into the neural network blindly as a black box.In this work, to address these problems, we first applied a self-attention mechanism [27], [28] to generate a task-relevant connectivity graph through end-to-end GNN training.Then, we introduced a squeezeand-excitation (SE) block [29] to selectively highlight important features, thereby improving the interpretability of our model.
We designed a novel architecture for drowsiness detection as follows.First, we applied self-attention mechanisms to take full advantage of the power of GNN to generate connectivity patterns.Self-attention mechanisms generate importance scores that relate different positions in a single sequence to an encoded sequence through end-to-end training [30].Due to its ability to capture long-range dependencies, the self-attention mechanism has demonstrated good performance in machine translation [31], [32] and computer vision tasks [33].Inspired by the attention mechanism [27], [28], we designed a method to learn an adjacency matrix based on a weighted feature vector, which can capture the interdependencies between EEG channels through end-to-end training.
Furthermore, we introduced an SE block to make the network focus on more informative features in end-to-end training.Originally, the SE block was designed for channelwise attention for image classification tasks, and it has been shown to improve EEG classification accuracy in some studies [34], [35], [36].Instead of exploring the interdependencies between EEG channels, we applied the SE block to extract the interdependencies between EEG features, which has been overlooked in many previous studies.The results show that the SE block not only improved the classification accuracy but also enhanced the interpretability of our model by highlighting the most important frequency features.
The main contributions of our work can be summarized as follows: 1) We proposed a novel method to generate connectivity patterns in GNNs that outperforms other state-of-the-art CNNs and GNNs in drowsiness classification in both balanced and imbalanced datasets.2) We showed that the SE attention score of our model indicated the most important frequency features; moreover, when our model was retrained using the frequency band with the highest attention score, the accuracy improved.3) We demonstrated that our learned connectivity patterns could provide some insights into neuroscience and that our generated connectivity patterns showed weaker parietal and occipital connectivity and decreased intraregional connectivity, indicating local segregation effects in drowsy states.

II. RELATED WORK A. Popular Architectures for EEG Classification
Convolutional neural networks (CNNs) are the most popular architectures for EEG classification thanks to their strong feature extraction abilities.The ShallowConvNet [16] architecture consists of two convolutional layers (a (1, 13) temporal filter and a (C, 1) spatial filter, with C representing the number of channels) and has shown good performance based on motor imaginary datasets.EEGNet [15] is an improved version of ShallowConvNet that introduces a depthwise separable convolutional layer, which reduces the number of parameters and prevents overfitting, and decouples the relationship within and across each feature map.Cui et al. [17], [21] developed InterpretableCNN and further improved EEGNet by introducing a global average pooling (GAP) layer to replace the widely used fully connected layer, which reduced the number of parameters and prevented overfitting.They demonstrated that InterpretableCNN outperforms EEGNet and ShallowConvNet based on a drowsy driving dataset [37].One of the limitations of CNNs is that they are designed for regular and Euclidean structured data [38], [39], and this is not sufficiently precise since the natural geometry and connectivity of brain networks in EEG data are non-Euclidean [22].

B. Graph Neural Networks for EEG Classification
In graph neural networks (GNNs), the convolutional layers in CNNs are replaced with graph convolutional layers.GNNs for EEG classification model the brain as a graph, with the electrodes represented by nodes and the connectivity between them represented by edges.One key challenge of GNNs is selecting appropriate graph representations for EEG signals.This is usually performed by exploring some prior knowledge in the EEG data.For instance, EEG-GCNN [24] defined the graph based on the electrode distance and spectral coherence between different EEG channels.Similarly, Tang et al. [22] constructed a graph based on the electrode distance and correlations between spectral features.In contrast, Jingcong et al. showed that a learnable graph generation method called the self-organized graph neural network (SOGNN) [23] could outperform the traditional correlation and covariance graph representation.However, the graph learned by the SOGNN was highly asymmetric, which might not be the case for healthy individuals [40].We note two key insights based on these architectures.First, we aim to generate the GNN graph representation through end-to-end training to learn a task-relevant graph for downstream tasks.Second, we aim to generate a biologically meaningful graph that can offer valuable insights into neuroscience.

A. Dataset Description
The dataset in this article was collected at the Brain Research Center (BRC) at National Chiao Tung University (NCTU), Hsinchu, Taiwan.This study was performed in strict accordance with the recommendations in the Guide for the Committee of Laboratory Care and Use of the National Chiao Tung University, Taiwan, with ethics approval (2012-08-019BCY) from the Institutional Review Board of the Veterans General Hospital, Taipei, Taiwan.This dataset contains 32-channel EEG recordings (including one ground electrode and one reference electrode), collected from 27 healthy subjects (age range: 22 to 28 years) in a driving simulator over 62 sessions, with each recording occurring over 90 minutes.The impedances of the EEG electrodes were maintained at less than 5 k , and the sampling rate was 500 Hz.The experiment was conducted in a 360-degree virtual reality (VR) lab [41].The scenario was a monotonous VR highway scene aiming to induce drowsiness [42].During the task, lane departure events were induced randomly to make the car drift towards the left or right sides.The subjects were instructed to respond as quickly as possible to move the car back to the centre of the lane.Between two consecutive trials, there was one resting period that lasted for a random amount of time ranging from 5 − 10s to prevent participants from anticipating a potential deviation.The reaction time (RT) was measured to evaluate the driver's performance.The RT was defined as the latency between the onset of the deviation and the onset of the response.A high RT indicates that the participant is relatively drowsy, while a short RT indicates that the participant is relatively alert.A detailed description of this dataset can be found in [37].The dataset is also publicly available from figshare [43].

B. Data Processing
The dataset was preprocessed with the following steps.First, the raw data were filtered with a finite impulse response (FIR) filter with frequency cut-offs of 1 Hz and 50 Hz.Second, artefacts were automatically removed via the Automatic Artefact Removal (AAR) plug-in for EEGLAB.The AAR method is suitable for real-time applications [44], [45].Third, all trials with reaction times of less than 100 ms were removed.Because human reaction times are unlikely to be less than 100 ms [46], in these cases, the subjects might have turned the steering wheel by chance instead of actually responding to the stimuli.We then downsampled the EEG data to 128 Hz and extracted the 5 seconds of samples prior to the stimuli presentation.We included only sessions in which the motion function of the simulator was disabled to minimize motion artefacts.The drowsy samples and alert samples were labelled according to the method proposed in [47]:

TABLE I SUMMARY OF THE SELECTED NUMBER OF SAMPLES FOR EACH SUBJECT IN THE BALANCED AND UNBALANCED DATASET
• The global reaction time (RT) was calculated by averaging all the local RTs within a 90-second window before the onset of deviation [48].The alert RTs were calculated as the 5 th quantile of all the local RTs.
• The 5-second samples were labelled "Alert" if both the local and global RTs were shorter than 1.5 times the alert RT and were labelled "Drowsy" if both the local and global RTs were greater than 2.5 times the alert RT.The eligible trials were selected by the following procedures: 1) Sessions with fewer than 40 samples of either class were discarded.
2) The samples from each session were balanced by choosing the most representative samples with the shortest (for the alert class) or longest (for the drowsy class) local RT [17].3) For multiple sessions with the same subject, we selected the session with the most samples.The first step was used to exclude sessions with too few samples in one or both classes for neural network training.The second step ensured a balanced dataset for both classes.For the imbalanced dataset, this step was not used.The third step was used to balance the number of subjects and ensure that the sessions with the most samples were retained.As summarized in Table I, we obtained 1880 samples from 11 subjects for the balanced dataset and 2848 samples for the naturally imbalanced dataset.We obtained a balanced dataset with a size of [1880 samples × 30 channels × 640 timepoints], and an imbalanced dataset with a size of [2848 samples × 30 channels × 640 timepoints].

C. Network Architecture
The architecture of the proposed method is shown in Figure 1.The graph in the GNN can be formulated as a set of nodes, edges and weights of edges G = (N , E, A).The node N corresponds to the EEG data of each channel, and the weights of the edges denoted by the adjacency matrix A in the GNN represent the connectivity between the EEG channels.
Each node in the GNN was constructed by the following steps.First, the 5-second EEG sample was divided into ten 0.5-second EEG segments.Then, to model the EEG spectral features, we used a fast Fourier transform (FFT) to transform Fig. 1.Architecture of the proposed connectivity-aware graph neural network.First, the frequency features were extracted from the EEG time series through FFTs.Then an SE block was used to highlight important frequency features, and a self-attention mechanism was used to construct the graph from the scaled features.Then, the scaled features and graph were fed into a GNN encoder and decoded via a GRU.Finally, a max-pooling layer and a fully connected layer were applied for classification.
the time series segments into frequency spectra and retained only the positive frequencies in the frequency spectrum to remove redundancy.We included frequency bins only in delta, theta, alpha, and beta bands since these bands are more closely related to drowsiness features.With this approach, we obtained ten frequency spectra with 16 frequency bins: [X 1 , X 2 . . .X 16 ].Next, we applied an SE block to obtain a set of modulation weights for the input feature vectors.In the SE block, first, a squeeze operation was applied to aggregate the feature vectors [29]: where C and T ′ are the first two dimensions of the feature vector X ∈ R C×T ′ ×F , C is the channel dimension, T ′ is the time dimension, and F is the frequency dimension.The squeeze operation is a global pooling of the feature vector in the channel and time dimensions.Then, an excitation block was applied to learn the sample-specific activation for each feature [29]: where δ(•) denotes the ReLU [49] function, σ (•) denotes the sigmoid function, and W 1 and W 2 represent two fully connected layers, where In this article, we set the squeeze ratio r to 8. Next, we designed a method to learn the adjacency matrix based on the weighted feature with a self-attention mechanism.The construction of the edges can be expressed as: where ⊙ denotes elementwise multiplication, X is the normalized input feature vector of the EEG channels, a tanh function was applied to map the input to (0, 1) and W 3 denotes a fully connected layer.The key and query are obtained from the same feature vector, and σ denotes the sigmoid function, which is applied to map the attention weight to (0, 1) while ensuring a symmetric self-attention score.The self-attention score was used as the adjacency matrix of the GNN.
The key and query are the same for the following reasons.We generate the graph using the self-attention mechanism to mimic the functional connectivity.Functional connectivity represents the temporal coincidence of two spatial separate neurophysiological events, which are nondirectional and symmetric [50].To generate A = K Q T as a symmetric functional connectivity matrix, the key K and the query Q needed to be the same.
Then, the normalized Laplacian of the GNN was computed by: where D is A diagonal degree matrix of the normalized adjacency matrix with D ii = j A i j and I is the identity matrix.To capture the dynamic characteristics of functional networks, in this work, we use a variant of the diffusion convolutional recurrent neural network (DCRNN) [51] for brain dynamics modelling, which can model temporal and spectral dynamics at the same time.
The DCRNN has demonstrated impressive performance in tasks such as traffic forecasting, with the model learning complex spatiotemporal patterns and obtaining accurate predictions [51].The DCRNN has also shown promise in brain dynamics modelling, enabling researchers to understand the complex interactions and dynamics of neural processes in the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
brain [22].In the DCRNN, the encoder encodes the spatiotemporal dependencies of the EEG data as a feature representation using diffusion convolutional gated recurrent units (DCGRUs).Specifically, the encoder consists of two recurrent layers with 32 recurrent units in each layer, and the information is encoded into a set of 30-channel node features (R 30×32 ).The DCGRU was used instead of matrix multiplication operations in the gated recurrent unit (GRU) [52], with the diffusion convolution defined as follows: where X (t) , H (t) are the input and output at time t of the reccurent layer and r (t) and u (t) are the reset gate and update gate at time t, ⊙ represents the Hadamard product, and ⋆G denotes the ChebNet spectral convolution.
The DCRNN propagates messages between nodes through diffusion convolution.In the case of an undirected graph, diffusion convolutions were performed using ChebNet spectral graph convolution [53]: where θ k reprensents the learnable weights for the k th order Chbyshev polynomial approximation, and L is the renormalized Laplacian matrix of the graph, Finally, the node features are aggregated through max-pooling along the node dimension, followed by two fully connected layers (32 × 16, 16 × 2) that output the probabilities of the two classes.The binary cross-entropy loss function was used for network training: where x n denotes the n th sample of the network output, y n denotes the n th sample of the label and σ (•) is the sigmoid activation function.The optimization objective is to minimize the binary cross-entropy loss between the network output and the labels.

D. Implementation Details
All the code was implemented with Python 3.8.0.All the models were implemented with PyTorch 1.9.0.The models were trained on a computer with NVIDIA Quadro P5000 graphic cards.The Adam optimizer with an initial learning rate of 1e −3 and a cosine annealing weight decay were used to train the proposed CAGNN.All the models were trained for 30 epochs.

A. Performance Analysis
We used leave-one-out cross-validation since the number of subjects was small.During training, one subject was left out for evaluation, and the data from other subjects were used for training.All experiments were repeated five times with five different random seeds.All the models were evaluated at the 30 th epoch, and all models converged at approximately 15 epochs, as shown in Figure 3.We compared the performance of our CAGNN model with that of state-of-the-art CNN models, including EEGNet and ShallowConvNet, and a recently developed model called InterpretableCNN [21].InterpretableCNN was originally developed only for offline analysis, and the sample must be sent to the model as a bundle during evaluation.This is not desirable for real-time applications since new samples are usually obtained individually instead of as a bundle.In this article, all the methods were implemented in such a way that real-time data could be used.Note that InterpretableCNN was reimplemented to handle real-time data by turning off the tracking of the running mean and variance in evaluation mode.The results are summarized in Table II, and the training curve is shown in Figure 3.In addition to accuracy, we used the F1 score, precision, and recall, which are commonly used evaluation metrics for binary classification.The accuracy is defined as the proportion of correct predictions made by the model; the precision measures how many positively labelled instances predicted by the classifier were actually true positives; the recall, also known as the sensitivity, measures how many actual positive instances were correctly identified by the classifier; and the F1 score is the balanced mean of the precision and recall.The accuracy, precision, recall, and F1 score are formulated as follows: Pr ecision = T P T P + F P (12) where TP denotes true positives, TN denotes true negatives, FP denotes false positives, and FN denotes false negatives.With the balanced dataset, our model (with the SE block) outperformed the other three models in terms of overall accuracy, with a mean accuracy of 70.5.We noticed that without the SE block, the accuracy of our model dropped by approximately 1% to 69.7%.In terms of individual subjects, our model outperformed the other models in Subjects 1, 2 and 11 in accuracy and Subjects 1, 2, and 11 in F1 score.The accuracy of our model was all above 58%, while all the other models had some participants with accuracies below 58%.
We show the attention scores learned by the SE block in Figure 3. Next, we calculated the average attention scores for four feature bands: delta, theta, alpha, and beta.As shown in Figure 3, the theta band had the highest attention score, followed by the alpha, delta, and beta bands.We then retrained our CAGNN model with different frequency features.Feature clipping with the theta band yielded the highest accuracy, 72.6%, and F1 score, 70.7%, which correspond to the frequency band with the highest SE attention score.Feature clipping with the delta band and the alpha band resulted in an accuracy of approximately 67%.Feature clipping in the beta band resulted in the lowest accuracy, and the beta features also had the lowest SE attention score.This might be because the theta band features are unique to drowsiness, and the alpha band features coexist in drowsy and alert samples, as illustrated in Figure 6.We also evaluated the performance of EEGNet, ShallowConvNet, and Inter-pretableCNN with a 5 th order Butterworth filter applied in the delta, theta, alpha, and beta bands.However, the performance of these comparison methods did not improve after applying the filtering process.This may have been due to their original design, which incorporated a temporal convolutional layer with a size of (1, T) meant to handle a wide range of EEG frequency bands.Specifically, ShallowConvNet was inspired by the filter bank common spatial patterns (FBCSP) pipeline [54], which involves separating an EEG signal into different frequency bands for feature extraction.Similarly, in ShallowConvNet [16], the first layer performed a temporal convolution operation that mimicked the bandpass CSP filter in FBCSP.This design has been adopted by many CNN architectures for EEG classification, including EEGNet and InterpretableCNN.In this case, bandpass filtering might have disrupted their ability to capture spatial patterns across the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.broader frequency spectra, as these models were designed to work effectively without prior bandpass filtering.

TABLE IV PERFORMANCE COMPARISON OF THE CAGNN AND THREE BASELINE METHODS BASED ON THE UNBALANCED DATASET
We also compared the performance of the CAGNN model with other models based on the naturally unbalanced dataset, as shown in Table IV.Our method with feature clipping in the theta band exhibited a slightly lower accuracy than Inter-pretableCNN and ShallowConvNet.Nevertheless, our method outperformed the other comparison methods in terms of various metrics for unbalanced datasets, such as the F1 score, precision, and recall, outperforming the other models by more than 5%.These results demonstrate the superior ability of our method in identifying dangerous positive drowsy samples compared to other approaches.
We then compared the accuracy of the CAGNN with the accuracies of other graph generation methods, as shown in Figure 4. We selected a static graph generation method using the correlations between EEG segments proposed by [22] and a learnable graph generation method (SOGNN) [23].The CAGNN had the highest average accuracy among the three methods.Paired T-tests with Bonferroni correction were used for statistical analysis.The accuracy of the CAGNN model was significantly higher than those of the SOGNN ( p = 0.029) and the correlation graph generation method ( p = 0.004), while there was no significant difference between the accuracies of the SOGNN and the correlation graph methods.
The SOGNN used a different weight matrix for the key and query, which was the same as that of the transformer, to construct the attention matrix.This alternation led to a suboptimal result and can be attributed to the fact that the transformer architecture was originally designed for large-scale datasets such as those in language and image processing domains.In such contexts, employing distinct key and query matrices can introduce additional parameters and facilitate better fitting.However, when working with smaller datasets such as EEG data, this approach may lead to overfitting.

B. Interpretability Analysis
We next investigated the learned connectivity graphs generated by the CAGNN with features in the theta band in the drowsy and alert conditions as shown Figure 5.
To compare the difference between drowsy and alert connectivity, we performed a paired t-test with Bonferroni correction and subtracted the alert connectivity from drowsy connectivity.All the non-significant connectivity ( p >= 0.05) was set to 0. We found that there were some differences between these two states in terms of connectivity.First, there was lower connectivity in the occipital region in the drowsy state.This finding was consistent with the findings in [55], [56], [57], and [58], which might be an indication of fading of consciousness [59].Second, there was higher connectivity in the parietal region in the alert state.This was consistent with the findings in [13] and [60], where significantly weaker connectivity was found within the parietal and occipital regions in drowsy states.In addition, there was lower interregional (e.g., frontal-occipital) connectivity in the drowsy state, which might be due to the local segregation effect and is consistent with the findings in [12], this might indicate that brain connectivity was transforming to a more efficient architecture to maintain brain function in the drowsy states.
We conducted post hoc interpretability analysis via an occlusion-based approach [61].During each iteration, we set one segment of EEG data (half a second per channel) to zero and calculate the relative change in the model output.This resulted in an occlusion map O ∈ R T ×C , with T = 10 corresponding to the 10 segments of EEG data and C = 30 corresponding to the number of channels.The occlusion map was normalized to the range 0 to 1.The strength of the occlusion map indicates the saliency of a specific EEG clip.Four examples of selected occlusion maps are shown in Figure 6.As shown in figure 6a, the CAGNN model could recognize the alpha spindles and give a correct prediction of a drowsy sample with a high probability of 0.92.In Figure 6b, the CAGNN could capture the beta oscillation and predict an alert sample with a probability of 0.7.In figure 6c, CAGNN gave an incorrect prediction of a drowsy sample with a low probability of 0.51 since there were also alert features, which were the beta oscillations in this sample.In figure 6d, the CAGNN gave an incorrect prediction since there were also coexisting alpha spindles and beta oscillations in this example.For the other subjects, the CAGNN successfully captured alpha spindles in 10 of the 11 subjects with drowsy samples, and identified beta oscillations in all 11 subjects with alert samples, as demonstrated in the Supplementary Materials.
In addition, we analyzed the band power of the highlighted segments (occlusion map score > 0.5) in Supplementary Figure 3.We found that when our model predicted drowsiness, it focused on the segments with higher alpha power, with an average alpha power of 2.43 dB when the model predicted drowsiness and an average value of 2.05 dB when the model predicted alertness; in contrast, when our model predicted alertness, it highlighted segments with higher beta power, with an average beta power of -0.60 dB when the model predicted drowsiness and a value of -0.29 dB when the model predicted alertness.A statistical analysis was performed via an Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE V PERFORMANCE COMPARISON AMONG DIFFERENT VARIANTS OF THE CAGNN
independent t-test, and both the alpha and beta power levels were significantly different ( p < 0.01).

V. DISCUSSION
In this article, we proposed a task-relevant graph-generation method that can outperform other state-of-the-art CNN baselines in a driver drowsiness dataset.First, we applied a self-attention mechanism to generate connectivity in the GNN and showed that it can outperform other graph generation methods.Second, we utilized an SE attention block to improve the overall performance by 1% and showed that the SE attention could indicate the informative features correctly since when we retrained the model using the band with the highest SE attention score the accuracy improved by 2%.In this work, we first tested our model in the balanced dataset setting and showed that our method can surpass other methods by 3% in terms of accuracy, which helps researchers to establish the baseline understanding of this technology.After that, we tested on the imbalanced dataset and showed that our model could surpass other methods in terms of unbalanced dataset metrics including F1 score, precision, and recall, which means our method identifies the dangerous positive drowsy class well.Remarkably, the high precision and recall of our method revealed that our method can avoid false positives and false negatives effectively.By combining both balanced and imbalanced data testing, we can gain a more comprehensive understanding of the technology's strengths, weaknesses, and potential for real-world application.Finally, we showed that our generated connectivity had lower occipital and parietal connectivity and lower interregional connectivity and was consistent with some neuroscience findings; it could provide some biological insights into EEG research.

A. Effects of Each Building Block of the CAGNN
Our method leverages several deep learning architectures, including self-attention, a graph neural network, and a gated recurrent unit (GRU).To investigate the benefit of each individual component of our network, we analyzed the performance of several variants of our proposed method as shown in Table V.First, we investigated the effect of self-attention by replacing it with the correlation between channels, as proposed in [22].This substitution led to an observed decrease in accuracy of approximately 3%, which might have been due to the self-attention mechanism that could explore the rich context between EEG channels to adaptively learn the brain network according to the downstream task.Second, we investigate the effect of the graph neural network by replacing the encoder with a CNN.The implementation followed the CNN-LSTM proposed in [62], with a replacement of LSTM to a GRU.The CNN encoder proposed in this work consists of two convolutional layers (32 kernels of size 3 × 3) to extract features from EEG time series.This decreased the accuracy significantly.This outcome implies that the graph neural network performs better when encoding the brain network because it has the ability to effectively capture and encode the intricate relationships between EEG channels.Third, we examine the effects of GRU by removing GRU and directly perform classification after the graph convolution via a feedforward network which is two fully connected layers (32 × 16, 16 × 2).This also decreased the accuracy significantly, which illustrated the GRU was beneficial for EEG classification as it can model the dynamic temporal information.

B. The Information Learned by the SE Block
In conventional graph neural networks [22], the input feature vectors are fed into the graph convolutional layer directly.In our approach, we introduced the SE block before the convolutional layer, and thus the interdependencies between different frequency features can be captured by the SE block, and the sensitivity of the model to more informative features increases.To exploit the dependencies between the features, the squeeze part of the SE block has a global average pooling layer to learn statistics between features, as depicted in equation (1).The excitation part uses a bottleneck gating mechanism, as depicted in equation ( 2) to generate the nonlinear interactions between features.We visualized the attention weight learned by the SE block and showed that it could be useful for feature selection.The theta band feature had the highest SE attention score, and when we retrained the model with only the theta band feature, the accuracy improved by approximately 2%.Therefore, the attention score of the SE block can reveal informative features.

C. Biological Insights of the Generated Connectivity and Our Model
Our model could potentially reveal more connectivity information than traditional CNNs through end-to-end training.In contrast to static connectivity methods, our learnable connectivity patterns are relevant to downstream tasks and thus could increase classification accuracy.In addition, we found some drowsy connectivity features were identifiable from our generated connectivity.For instance, we found lower connectivity in the occipital region in the drowsy state.This fulfills our hypothesis since our task was visual sustained attention, and the occipital lobe is strongly correlated with visual information processing [63].The lower connectivity in the occipital region might suggest the cortical gate mechanism that prevents the brain from the external sensory output [64].
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Furthermore, our model revealed lower interregional connectivity in drowsiness, which might indicate that the state of the brain network changed to allow more efficient local information processing in drowsy states.This might indicate a local segregation effect, as depicted in [65].We also explained why the model sometimes made a wrong prediction.As drowsy features are very complex and the coexistence of drowsy and alert features (e.g., alpha spindles and beta oscillations in Figure 6c) is very common, sometimes the model becomes confused and makes an incorrect decision.This may indicate that the participant was struggling against drowsiness and tried to perform the task well.

D. Challenges and Further Research Directions
EEG data is intrinsically more difficult to explain than image or natural language data.Therefore, in the future, we will explore some explainable model methods for graphs, such as graph attention networks [66], to improve explainability.Moreover, although our model had a smaller intersubject variance than other CNN-based models, the subject variance was still high.The leave-one-out accuracy in the balanced dataset ranged from 58.3 to 87.1, which means there was still a great difference in data distributions across different subjects.Meta-learning may be a way to address this problem in the future [67].Furthermore, we only performed experiments on one dataset.Thus, we will test whether our method can be transferred to other neuroscience datasets as well.Another limitation of our dataset is that only sessions without motion have sufficient data for analysis.In the future, we will include more sessions with motion to evaluate our model in a more realistic setting.Additionally, in this work, to capture the entire brain network, a high-density 32-channel EEG system was utilized.After this study, we may limit EEG measurements to only the most critical regions.This strategy has the potential to significantly reduce the amount of setup time in future studies.

VI. CONCLUSION
In this work, we proposed a novel connectivity-aware graph neural network (CAGNN) that is both connectivity-aware and task-aware.Our method is a new state-of-the-art approach for real-time drowsiness classification in both balanced and imbalanced datasets.We introduced the SE block to highlight the most important features, and we generated the graphs with our GNN model through self-attention mechanisms via end-toend training.Furthermore, we showed that this approach not only improved the performance of our model but also provided meaningful neurophysiological explanations in terms of brain connectivity (e.g., local segregation) in drowsy states.We also demonstrated that our model could identify drowsy features such as alpha spindles.Our method could be applied in real-time driver drowsiness monitoring systems in the future.
are the bases of the Chebyshev polynomial.The diffusion convolutional layer in the encoder works by propagating node features across the graph structure.The diffusion operation captures the local and global dependencies of the nodes in the graph, allowing the network to learn representations that capture the complex interactions and correlations between the different EEG nodes.

Fig. 2 .Fig. 3 .
Fig. 2. Attention score of the SE block over different frequency features (upper panel) and frequency bands including delta, theta, alpha, and beta bands (lower panel).(Error bar: standard deviation.)

Fig. 5 .
Fig. 5. Generated CAGNN connectivity in the drowsy state (upper panel), alert state (middle panel), and the difference between the drowsy and alert state.All the non-significant differences (p ≥ 0.05) were set to 0.

Fig. 6 .
Fig. 6.Occlusion map analysis of the CAGNN for selected samples: (a) correctly classified drowsy sample, (b) correctly classified alert sample, (c) incorrectly classified drowsy sample, and (d) incorrectly classified alert sample.The channels are ordered in the same way as in Figure 5.

TABLE II PERFORMANCE
COMPARISON OF THE CAGNN AND THREE BASELINE METHODS ON THE BALANCED DATASET TABLE III PERFORMANCE OF THE CAGNN IN DIFFERENT FREQUENCY BANDS BASED ON THE BALANCED DATASET