Self-Attentive Channel-Connectivity Capsule Network for EEG-Based Driving Fatigue Detection

Deep neural networks have recently been successfully extended to EEG-based driving fatigue detection. Nevertheless, most existing models fail to reveal the intrinsic inter-channel relations that are known to be beneficial for EEG-based classification. Additionally, these models require substantial data for training, which is often impractical due to the high cost of data collection. To simultaneously address these two issues, we propose a Self-Attentive Channel-Connectivity Capsule Network (SACC-CapsNet) for EEG-based driving fatigue detection in this paper. SACC-CapsNet starts with a temporal-channel attention module to investigate the critical temporal information and important channels for driving fatigue detection, refining the input EEG signals. Subsequently, the refined EEG data are transformed into a channel covariance matrix to capture the inter-channel relations, followed by selective kernel attention to extract the highly discriminative channel-connectivity features. Finally, a capsule neural network is employed to effectively learn the relationships between connectivity features, which is more suitable for limited data. To confirm the effectiveness of SACC-CapsNet, we collected 24-channel EEG data from 31 subjects (mean age=23.13±2.68 years, male/female=18/13) in a simulated fatigue driving environment. Extensive experiments were conducted with the acquired data, and the comparison results show that our proposed model outperforms state-of-the-art methods. Additionally, the channel covariance matrix learned from SACC-CapsNet reveals that the frontal pole is most informative for detecting driving fatigue, followed by the parietal and central regions. Intriguingly, the temporal-channel attention module can enhance the significance of these critical regions, and the reconstructed channel covariance matrix generated by the decoder network of SACC-CapsNet can effectively preserve valuable information about them.

a temporal-channel attention module to investigate the critical temporal information and important channels for driving fatigue detection, refining the input EEG signals. Subsequently, the refined EEG data are transformed into a channel covariance matrix to capture the inter-channel relations, followed by selective kernel attention to extract the highly discriminative channel-connectivity features. Finally, a capsule neural network is employed to effectively learn the relationships between connectivity features, which is more suitable for limited data. To confirm the effectiveness of SACC-CapsNet, we collected 24-channel EEG data from 31 subjects (mean age=23.13±2.68 years, male/female=18/13) in a simulated fatigue driving environment. Extensive experiments were conducted with the acquired data, and the comparison results show that our proposed model outperforms state-of-the-art methods. Additionally, the channel covariance matrix learned from SACC-CapsNet reveals that the frontal pole is most informative for detecting driving fatigue, followed by the parietal and central regions. Intriguingly, the temporal-channel attention module can enhance the significance of these critical regions, and the reconstructed channel covariance matrix generated by the decoder network of SACC-CapsNet can effectively preserve valuable information about them.

I. INTRODUCTION
D RIVERS in a fatigued state are prone to losing their ability to make sound judgments and to quickly and accurately respond in an emergency, making fatigue one of the most common causes of fatal traffic accidents. Driving fatigue is considered responsible for 20%-30% of all traffic accidents [1], posing a serious threat to road safety. Thus, it is critical to establish a reliable and accurate driving fatigue detection model.
A variety of indicators have been employed to monitor driving fatigue; they can be roughly grouped into two categories. The first category of indicators is based on the driver's behavioral performance, such as facial expression [2] and the vehicle's state [3]. The second category of indicators is based on physiological signals, including electroencephalography (EEG) [4], [5], electromyography [6], electrooculography [7], and electrocardiography [8]. Among these available physiological indicators, EEG directly captures electrical activities originating from the brain, providing extensive information about human cognitive processes and substantially linking with the fatigued state [9], [10].
Both shallow learning methods (e.g., support vector machines) and deep learning methods (e.g., convolutional neural networks) have been utilized to estimate the fatigued driving state with promising results based on EEG. Overall, shallow learning methods are usually divided into two steps: handcrafted feature extraction and classification. Frequencydomain features (e.g., the power spectrum [11], [12] and entropy [13], [14]) and brain connectivity features [9] (e.g., the phase lag index (PLI) [15], phase lock value (PLV) [16], and partial directional coherence (PDC) [17]) are the most commonly utilized handcrafted features for driving fatigue measurement. Handcrafted feature extraction is considered tedious, time-consuming, and heavily reliant on domain-specific professional knowledge. Based on handcrafted features, previous research has offered satisfactory solutions for driving fatigue detection. For example, in [18], integrated entropy features were fed into support vector machines for driving fatigue detection, yielding promising outcomes. In [15], the PLI feature was evaluated using four kinds of classifiers, i.e., BP-Adaboost, random forest, relevance vector machine, and support vector machine, demonstrating the high discriminability of the PLI feature in driving fatigue detection. However, these methods cannot be learned in an end-to-end manner to optimize parameters comprehensively. Consequently, some crucial information may be lost, leading to suboptimal performance.
On the other hand, deep learning methods have shown superior performance in driving fatigue detection. Cheng et al. [19] transformed recorded EEG signals into image-liked data and then passed it to a CNN for driving fatigue prediction. Zeng et al. [20] combined a CNN with deep residual learning for driving fatigue prediction. Hajinoroozi et al. [21] introduced a channel-wise CNN for identifying the fatigue state based on raw EEG data. Gao et al. [22] proposed a spatial-temporal CNN for raw EEG data, considering the inter-channel relations. Xu et al. [23] proposed a convolutional attention neural network for driving fatigue detection. CNN models have been demonstrated to be competitive in detecting driving fatigue. Nevertheless, typical CNN-based methods suffer from the following issues:1) They only account for the relationships among the channels that are close to them, resulting in a loss of some critical information from distant channels. 2) These methods assume that a large amount of data is available for learning to achieve satisfactory performance, which is usually impractical due to the high cost of data collection.
In recent years, Sabour et al. [24] introduced a new type of network called the capsule neural network (CapsNet), which has demonstrated its effectiveness, particularly in scenarios with limited training data. CapsNet is composed of capsules, with each capsule consisting of multiple neurons and outputting a vector. The length of the vector represents the posterior probability of a feature extracted by the capsule, whereas the orientation of the vector indicates the instantiation parameters (position, size, rotation, and so on). Dynamic routing, a mechanism more powerful than the pooling function used in CNNs, is employed between two capsule layers to ensure the appropriate routing of the lower-level capsule's output to relevant higher-level capsules. As a result, CapsNet can effectively model the relationships between local features and global features, overcoming the CNN's limitation of not being able to learn relationships between elements; hence, it is more suitable for limited data. Given these benefits, CapsNet and its variants have been recently extended to EEG-based classification with satisfactory results [25], [26], [27].
However, CapsNet has limitations when applied to EEG-based driving fatigue detection tasks because it fails to explore the critical temporal and channel information and the inter-channel relations essential for driving fatigue detection. Therefore, we propose the Self-Attentive Channel-Connectivity Capsule Network (SACC-CapsNet) for EEG-based driving fatigue detection. Starting with the raw EEG data, SACC-CapsNet employs a temporal-channel attention mechanism to autonomously investigate the critical temporal information and crucial channels for driving fatigue detection, producing refined EEG data. Next, the refined EEG data are transformed into a spatial EEG covariance matrix to characterize the inter-channel relations, followed by a selective kernel attention mechanism [28] to extract the highly discriminative features at varying scales, referred to as convoluted connectivity features. Finally, these convoluted connectivity features are fed into the capsule network to leverage its benefits, such as dynamic routing to learn relationships between elements, which is more suitable for limited data.
To confirm the effectiveness of SACC-CapsNet, we collected 24-channel EEG data from 31 subjects (mean age=23.13±2.68 years, male/female =18/13) in a simulated fatigue driving environment. Each subject repeated the same driving tasks twice with an interval of approximately one week or longer, yielding two sessions of fatigue-related data per subject. Extensive experiments have been conducted with the acquired data, and the experimental results show that the proposed SACC-CapsNet outperforms multiple competitive models in terms of accuracy, sensitivity, specificity, and F1-score. Based on the covariance matrix learned from SACC-CapsNet, we have made the following observations regarding the alterations of the covariance matrix from the alert state to the fatigue state. The strongest alterations are associated with the frontal pole, followed by the parietal and central regions. Moreover, these critical brain regions can be enhanced through the application of temporal-channel attention. Lastly, the reconstructed channel covariance matrix generated by the decoder network of SACC-CapsNet effectively preserves a significant amount of information pertaining to these crucial brain regions.
The arrangement of our study is structured as follows. Section II presents the SACC-CapsNet model with its modules and implementation details. The results, including comparisons with multiple competitive models, ablation studies, an analysis of critical connections and crucial brain regions, and the performance of SACC-CapsNet across sessions and (a) An EEG trial X(with C channels and T sampling points) is fed into the temporal-channel attention module to autonomously investigate critical temporal information and crucial channels for driving fatigue detection. (b) The produced refined EEG dataX are transformed into spatial EEG covariance matrices˜ to characterize the inter-channel relations, followed by a selective kernel attention mechanism to extract the highly discriminative features at varying scales. (c) The produced convoluted connectivity features are fed into the capsule network to leverage its benefits. In the capsule network module, most model architecture parameters are set according to the original CapsNet parameters for convenience. In the SACC-CapsNet, only the model architecture parameters D and E are required to be tuned.
subjects, are reported in Section III. Section IV provides the conclusions.

A. Model Overview
SACC-CapsNet consists of three modules: temporal-channel attention, channel-connectivity attention, and a capsule network. As depicted in Fig.1, given an EEG trial X ∈R C×T (C channels and T sampling points) as input, SACC-CapsNet first infers a temporal attention map M t ∈ R T ×T and a channel attention map M c ∈R C to autonomously investigate the critical temporal information and crucial channels for driving fatigue. Then, the final refined EEG trialX is transformed into a spatial EEG covariance matrix˜ ∈ R C×C , referred to as the channel-connectivity features, to reveal the interactions between multiple brain regions (i.e., inter-channel relations). Next, selective kernel attention is used for connectivity fea-tures˜ to automatically capture the highly discriminative channel-connectivity features at varying scales, referred to as convoluted connectivity features. Finally, the convoluted connectivity features are fed into the capsule neural network for further feature extraction and classification, which leverages the benefits of dynamic routing that learns relationships between elements. In this work, SACC-CapsNet is employed to perform a binary classification task of distinguishing between alert state and the fatigue state. We describe each module in the following subsections.
The feature tensor X is passed to a self-attention layer to reduce the signal noise, obtaining the temporal-refined EEG signals X∈R C×T : is normalized by column. Notably, based on Formula (1), each row of X (the EEG signal of each channel) is independently updated, preventing the mixing of inter-channel information. Our adopted temporal attention acts as a channel-wise convolution, according to the relationship between self-attention and the convolutional layers [29].
2) Channel Attention: Inspired by the biological evidence of driving fatigue and our previous works [15], [30],we seek to include the channel-attention mechanism to allow the model to autonomously investigate the critical channels for driving fatigue. Specifically, we leverage the temporal-refined EEG signals X to learn the weight α i for each channel .Given we leverage channel averaging function F c (·) to obtain the global feature representation of X : We utilize the global features to further enhance the critical brain regions (i.e., channels). The weight α i for each channel is determined by measuring the similarity between the global feature g and the local feature of each channel X i using a simple dot product: Following [31], [32], [33], and [34], we normalize α i and scale and shift the normalized value, and a sigmoid function gate σ (.) is used to obtain the channel attention weight vector M c ∈R C . The channel-refined EEG signalsX ∈ R C×T are obtained byX where ⊙ represents the element-wise multiplication, and the attention values are correspondingly broadcasted throughout multiplication.
For easy optimization and to ensure that all channels are taken into account without losing information, a residual connection is incorporated from the raw EEG signals X to the channel-refined EEG signalsX, obtaining the final refined featuresXX =X +X Notably, the use of local and global similarities as attention masks is a common technique in computer vision [31]. The temporal-channel attention module improves the discriminative ability of EEG signals for driving fatigue detection, potentially enhancing the significance of critical brain regions for fatigue detection (to be detailed in Fig.7, Section III).

C. Channel-Connectivity Attention Module
The refined EEG trialX=[x 1 , . . . ,x C ] T ∈R C×T is transformed into the refined spatial EEG covariance ∈R C×C to capture the interaction information of multiple brain regions by characterizing the inter-channel relations.˜ i j is defined as wherex i ∈R T , all the elements ofx i are identical, each representing the mean value of the elements inx i . The spatial EEG covariance matrix˜ is then fed into the selective kernel attention module [28] at varying scales to automatically capture the highly discriminative channel-connectivity features (referred to as convoluted channel-connectivity features). Selective kernel attention consists of the following three operators: Split, Fuse, and Select. 1) Split: Given˜ ∈R C×C , three convolution operations F 1 :˜ →Ũ∈R C×C×D , F 2 :˜ → U∈R C×C×D , and F 3 :˜ → U∈R C×C×D are performed with D filters of kernel sizes of 3, 5, and 7, respectively. Afterward, batch normalization and ReLU function are applied in order. In this paper, D is referred to as the depth of the convoluted connectivity features.
2) Fuse: The information from allŨ, U andǓ are integrated via an element-wise summation: Subsequently, global average pooling is employed to squeeze the channel connectivity features for each convoluted covariance matrix U d (d =1,2,3,. . . , D) independently, producing the statistics descriptors s ={s d } D d=1 . Specifically, the d-th element of s is obtained by Additionally, a feature descriptor z ∈ R D 2 ×1 is developed to offer guidance for accurate and adaptive selections, which is accomplished by a fully connected layer.
where δ denotes the RELU function, B represents the batch normalization, andW∈R D 2 ×D . 3) Select: A soft attention is employed to adaptively select different scales of channel-connectivity information based on the feature descriptor z. Specifically, the attention weights .The attention weights are multiplied with the learned representationsŨ, U andǓ to produce the final feature map .
where ={ d } D d=1 ∈ R C×C×D and d ∈ R C×C . The split operation generates multiple convoluted connectivity feature maps, which allows the model to focus on different aspects of the connectivity features, making it more effective in capturing more complex patterns. The fuse operation combines and aggregates multiple convoluted connectivity feature maps to obtain a global and comprehensive representation. The select operation can further refine the feature representation by focusing on the most relevant representations. Overall, using these operations can help capture the highly discriminative channel-connectivity features, leading to improved classification performance.

D. Capsule Network Module
The convoluted connectivity features are then employed as inputs to the Capsule Network, which consists of the PrimaryCaps layer and the OutputCaps layer. In the capsule network module, most hyperparameters are set according to the original CapsNet model [24] for convenience.
The PrimaryCapsules layer contains E×8 convolutional kernels of size 9 × 9 with a stride of 2, and outputs E channels of convolutional 8D capsules with an 8 × 8 grid, and capsules in the same grid share weights. Ultimately, this layer generates E×64 8D primary capsules.
The OutputCaps layer is designed as a fully connected layer that receives its input from the PrimaryCapslayer, which produces two 16D output capsules: alert capsule and fatigue capsule. These capsules reflect two categories of input samples.
Different from the traditional fully connected layer connection, CapsNet employs a dynamic routing mechanism to couple the capsules in the PrimaryCaps layer to the capsules in OutputCaps as follows [24]. Given the i-th output vector of the primary capsule u i , the total input of the j-th output capsule, h j ,is obtained by a transformation matrix W i j where u j|i denotes the prediction vector from primary capsule i to output capsule j, n denotes the number of primary capsules, and c i j represents a coupling coefficient from primary capsule ito output capsule j. The output vector v j of the output capsule j is obtained by performing a squashing operation: The coupling coefficient c i j is updated by the iterative dynamic routing process, which is determined as follows: where m denotes the number of output capsules, and b i j denotes the logarithmic prior probability that primary capsule i is coupled to output capsule j,which is initialized to 0 and updated as follows: The model parameters are updated by leveraging the L2-norm margin loss. The margin loss L j for each output capsule j is given as follows : where T j = 1 if and only if a sample of class j is present. m + = 0.9, m − = 0.1, and λ = 0.5, which are set according to the original CapsNet parameters [24].
We employ a decoder network, which consists fully connected (FC) layers, to reconstruct the raw spatial EEG covariance matrix (calculated by the raw EEG trial X). Following the original CapsNet design [24], the first layer and second layer have 512 neurons and 1,024 neurons, respectively, while the third layer contains C×C neurons, which is equal to the total number of elements of . The reconstruction loss, which serves as a regularizer and prevents overfitting, is defined below: where ′ is the reconstructed spatial EEG covariance matrix of . Combining the margin loss L j with the reconstruction loss L r , the total loss L of the fatigue network is obtained: The weighting parameter λ 1 is set to 0.0005 according to [24], ensuring that it does not dominate the margin loss throughout training.

E. Implementation Details
We only tuned a few combinations of hyperparameters to determine the best model configuration. Even so, the performance of SACC-CapsNet significantly outperformed multiple relevant state-of-the-art methods (to be detailed in Section III). We tuned the hyperparameters as follows: In the temporalchannel attention module, there are no hyperparameters that need to be tuned. In the channel-connectivity attention module, only the depth of the convoluted connectivity features (i.e., D) must to be tuned. D was finally set to 128 as this setting achieved the best performance in the search space of {32, 64, 128, 256}.In the capsule network module ,most hyperparameters are set according to the original CapsNet design for convenience, and only the number of channels (i.e., E) in the PrimaryCaps layer needs to be tuned. We finally set E to 24 based on the model performance obtained when E was 16, 24, 32, and 40.
SACC-CapsNet was implemented by PyTorch on a workstation with an RTX 2080Ti GPU and an Intel Core 9-10900KF CPU. The model was optimized using the Adam optimizer with a learning rate of 0.0001, 50 epochs, and a batch size of 32.

III. EXPERIMENTS AND ANALYSIS
Extensive experiments were carried out to verify the effectiveness of SACC-CapsNet. We collected 24-channel EEG data from 31 healthy subjects in a simulated fatigue-driving environment, which served as the experimental data. First, we evaluated the overall performance of SACC-CapsNet across a variety of subjects. Subsequently, we compared SACC-CapsNet with several relevant models to highlight its advantages. In addition, ablation studies were conducted to determine the significance of each component in SACC-CapsNet. Furthermore, we investigated the critical connections and brain regions for detecting driving fatigue using the spatial covariance matrix learned from SACC-CapsNet. Finally, we examined the temporal stability of SACC-CapsNet and its ability to be generalized to new subjects.

A. Experimental Data Acquisition
The experimental data were obtained at the Cognitive Engineering Laboratory at the Centre for Life Sciences, National University of Singapore (NUS). Thirty-one healthy subjects (mean age =23.13±2.68 years, male/female =18/13) recruited via online advertisements from the NUS were enrolled, and the experimental data were collected in a virtual reality simulated driving environment. All subjects were guaranteed to match the inclusion criteria, including no history of fatigue-related illnesses or mental disease, no intake of long-term drugs, and at least seven hours of sleep each night for two days prior to the start of the experiments. Each subject had extensive driving experience and was acquainted with the simulated driving environment. The research protocol was approved by the Institutional Review Board of the NUS, and written informed consent was obtained from all subjects.
As illustrated in Fig.2, the virtual reality simulated driving environment included a simulated driving system and a wireless dry EEG acquisition system. The simulated driving system featured a 65-inch LCD screen to provide a panoramic view of the road. Each experiment lasted 90 minutes. During the driving experiment, the EEG signals were recorded at a sampling frequency of 250 Hz with a 24-channel remote EEG cap (model: HD-72, Cognionics Inc., USA). The electrodes were placed according to the International 10/20 system. Horizontal and vertical electrooculograms (HEOG & VEOG) were recorded from electrodes positioned at the outer canthi and the regions above and below the right eye. Throughout the recording, the electrode impedance was kept below 20 k . Refer to our prior work [35] for detailed information on this setup.
The experiment was performed from 1 pm to 5 pm to increase the possibility of fatigue. Each subject repeated the same driving tasks twice with an interval of approximately one week or longer, yielding two sessions of fatigue-related data per subject.

B. Data Preprocessing and Validation Way
We adopted a standard EEG preprocessing as follows: The EEG signals were preprocessed with a bandpass filter of 1∼40 Hz and re-referenced by the common average reference (CAR). The artifact components were then removed using independent component analysis (ICA) [36].
We conducted a quantitative analysis of the subject's behavioral performance, using both the reaction time of the subject and the speed variation of the vehicle. Based on the statistical analysis of the obtained data, we adopted the first 10 minutes and the last 10 minutes to represent the alert state and the fatigue state, respectively. Additional information about fatigue state determination is provided in our previous study [30]. The preprocessed EEG signals (10-min signals labeled with the alert category and 10-min signals labeled with the fatigue category) were sliced into a series of 1-second non-overlapping segments. Each segment is treated as a sample. As a result, there are 1200 samples per subject, with 600 samples in each category (i.e., alert and fatigue).
Although the validation method of "random train-test splits" has been widely used in EEG-based driving fatigue detection [20], [22], [23], [30], it is not rigorous. The reason is that the samples were obtained by a division of continuously varying EEG data, and the features of adjacent samples are often highly correlated due to the continuous cognitive process in the brain. Although random train-test splits often obtain very high classification results because adjacent segments are included in both sets; this may lead to a significant drop in accuracy if the model is not exposed to highly correlated segments in realworld scenarios. Therefore, with the random train-test split, it is unclear whether the learned model can effectively detect driving fatigue or if it is just overfitting occurred. To achieve a more generalized evaluation for each subject, following [37], a rigorous validation way was conducted in our experiment. Specifically, for each subject, we selected the first 90% of samples from all the samples of each category for training, and the last 10% of samples of each category were reserved for testing. Notably, this train-test split is more consistent with the actual situation of driving fatigue prediction compared to other related works where samples are randomly selected for training and testing [20], [22], [23], [30].
We considered the fatigue state as the positive class and the alert state as the negative class. We employed the metrics of accuracy, sensitivity, specificity, and F1-score to comprehensively assess the model's performance from different perspectives.

C. Overall Performance
The detailed performances of each subject in the two sessions are depicted in Fig.3, which shows that the SACC-CapsNet model consistently performs well over the whole dataset. For session 1, the average accuracy is 94.17%, and all the subjects' accuracies surpass 80%. For session 2, the average accuracy is 90.59%, with the accuracies for 27 of 31 subjects exceeding 80%. These findings reflect that SACC-CapsNet can effectively and robustly distinguish between the alert state and fatigue state. The difference in accuracy (the standard deviation (STD) is 5.22% in session 1 and 8.30% in session 2) is possibly caused by the subjects' physical conditions and other uncontrollable circumstances [22], [30].
Additionally, for session 1, SACC-CapsNet achieves an average sensitivity of 95.91% and an average specificity of 92.42%. For session 2, SACC-CapsNet achieves an average sensitivity of 93.82% and an average specificity of 87.37%. These results indicate the excellence of SACC-CapsNet in identifying all categories. For SACC-CapsNet, the sensitivity is higher than the specificity, suggesting that it is easier to identify the fatigue state in both sessions, which meets the practical applications of driving fatigue detection since the fatigue state poses a serious threat to road safety. The classification ability of SACC-CapsNet is also indicated by the F1-score, which is 95.94% and 93.99% on average in session 1 and session 2, respectively. From the various aspects discussed above, it can be concluded that SACC-CapsNet is a promising approach for detecting driving fatigue.

D. Method Comparisons
To explore the superiority of SACC-CapsNet, we compared SACC-CapsNet with several relevant models. We searched the parameter space of the compared models according to the descriptions of the corresponding papers. A brief overview of these models is provided as follows: PSD-SVM [38]: This model extracts PSD features and employs SVM to predict driving fatigue.
PLI-SVM [15]: This model extracts PLI features and employs SVM to predict driving fatigue.
PLV-SVM: This model replaces the input features of PLI-SVM with PLV features.
EEG Image-based CNN [19]: This model transforms the recorded EEG signals into image-liked data and then passes them to a CNN for driving fatigue prediction.
EEG-Conv-R [20]: This model combines a CNN with deep residual learning for driving fatigue prediction based on preprocessed EEG data.
ShallowConvNet [39]: This model is a shallow CNN designed for EEG-based classification with inputs of preprocessed EEG data.
EEGNet [40]: This model is a compact CNN designed for EEG-based classification with inputs of preprocessed EEG data.
CNN-Attention [23]: This model combines a CNN with a self-attention mechanism for detecting driving fatigue with inputs of preprocessed EEG data.
AMCNN-DGCN [30]: This model links an attention-based multiscale CNN with a dynamical GCN for driving fatigue detection with inputs of preprocessed EEG data.
The performances of each method are plotted in Fig.4. SACC-CapsNet is shown to achieve the best performance on both sessions, followed by EEGNet, AMCNN-DGCN, and CNN-Attention, respectively. Additionally, we can make the following observations.
For shallow learning methods, PLI-SVM and PLV-SVM outperform PSD-SVM, which indicates that utilizing intrinsic inter-channel relations can improve the accuracy of driving fatigue detection. Compared to most deep learning models, PSD-SVM, PLI-SVM, and PLV-SVM have low recognition rates since they use handcrafted features and cannot merge the feature extraction and classification modules into a unified model, resulting in the loss of some crucial information.
For deep learning models, AMCNN-DGCN is more accurate than the EEG Image-based CNN, EEG-Conv-R, Shal-lowConvNet, and CNN-Attention, due to its elaborate designs and its ability to dynamically learn the interactions between multiple brain regions. However, AMCNN-DGCN may be overfitted due to the limited data for training. On the other hand, EEGNet, with a compact architecture, alleviates the overfitting problems and simultaneously explores the spatial-temporal information of EEG data. Therefore, EEGNet outperforms AMCNN-DGCN under small training data and is the second-best model in our experiments. However, EEGNet is still imperfect because it fails to effectively model the interactions between multiple brain regions and the relationships between the local and global features. In contrast, SACC-CapsNet can learn the relationships between different regions in a more effective way. Therefore, it can extract highly discriminative features and produce better results. For session 1, SACC-CapNet outperforms the secondbest model, i.e., EEGNet, by 5.56%, 5.65%, 5.48%, and 5.53%, respectively, in terms of accuracy, sensitivity, specificity, and F1-score. For session 2, SACC-CapNet outperforms the second-best model, EEGNet, by 4.62%, 3.49%, 5.75%, and 3.59%, respectively, in terms of accuracy, sensitivity, specificity, and F1-score. The experimental results were further analyzed using the Wilcoxon signed rank test [41]. The test results (p<0.001) indicate that SACC-CapsNet performs statistically significantly better than the other methods.

E. Ablation Studies
SACC-CapsNet consists of three components: temporalchannel attention (TCA) module, channel-connectivity attention module (spatial covariance matrix (SCM) + selective kernel attention (SKA)), and capsule network (CapsNet) module. Ablation studies were carried out to evaluate the influence of each component of SACC-CapsNet.
As depicted in Fig.5, all components of SACC-CapsNet are essential and complementary to each other. The capsule network using EEG data exhibits some limitations. By incorporating temporal-channel attention (TCA), which explores the critical temporal information and crucial channels for driving fatigue detection, we observe average improvements of 2.50% and 4.48% in accuracy and F1-score, respectively.  Additionally, the inclusion of a spatial covariance matrix (SCM) further enhances performance, resulting in average improvements of 3.92% and 4.65% in accuracy and F1-score, respectively. The spatial EEG covariance matrix (SCM) that explores the interactions between multiple brain regions, which can be considered as a type of connectivity feature, helps boost the accuracy of driving fatigue detection. Furthermore, leveraging selective kernel attention (SKA) at varying scales is beneficial to automatically capturing the highly discriminative features, leading to an average accuracy improvement of 5.39% and an F1-score improvement of 5.66%, respectively. In summary, integrating all components yields the most favorable results for achieving higher performance in driving fatigue detection.
To show the benefit of the capsule network, we replaced the capsule network module with a standard CNN (denoted as SACC-CNN), which consists of 3 convolutional layers of 256, 256, and 128 channels, each with 5 × 5 kernels and a stride of 1, followed by two fully connected layers of size 328 and 192 with dropout and a softmax layer. More details can be found in the original paper of CapsNet [24]. We observe that SACC-CapsNet significantly outperforms SACC-CNN by 2.69% and 3.52% in terms of accuracy and F1-score, respectively, since the capsule neural network can model the relationships between the local features and global features, overcoming the CNN's limitation of not being able to learn relationships between elements.

F. Study of Critical Connections and Brain Regions
The important connections are investigated for driving fatigue detection according to the refined spatial EEG covariance matrix˜ . Let˜ alert be the spatial EEG covariance matrix of the sample that belongs to the alert state and fatigue be the spatial EEG covariance matrix of the sample that belongs to the fatigue state. The alterations of the covariance matrix from the alert state to the fatigue state are estimated as follows: where ˜ is referred to as the alteration of connections, and the critical connections for driving fatigue detection can be reflected by the absolute value of the alteration, i.e., ˜ .The averaged critical connections among 31 subjects are summarized in Fig.6, which shows that the top critical connections are mostly linked to the frontal pole, indicating that nodes in the frontal pole are more active in forming connections with the other nodes across the whole brain.
Furthermore, to investigate the crucial brain regions associated with driving fatigue, we adopted Degree Centrality, a widely employed indicator to evaluate the importance of the nodes in the graph by quantifying their connection with the other nodes in the graph [42], [43]. As previously mentioned, ˜ denotes the absolute value of the alterations of connections between channels, and each channel is treated as a node of the graph. The i-th EEG channel degree centralitỹ C i is calculated bỹ For all channels, we obtainC . With the raw, refined, and reconstructed spatial EEG covariance matrices (i.e., ,˜ , and ′ ), we calculated their corresponding degree centralities (i.e., C,C, and C ′ , respectively). The averaged degree centrality for each channel among 31 subjects is summarized in Fig.7. For better visualization, the values of C,C, and C ′ were scaled to the interval of [0,1]. As shown in Fig. 6, it is observed that the strongest activations are in the frontal pole, followed by the parietal region and the central region, indicating that these brain areas may be closely linked to driving fatigue. Our findings are consistent with some previous studies [9], [15], [35], [44]. Driving demands that drivers remain vigilant of their surroundings while simultaneously performing various driving tasks. This process requires the activation of multiple brain regions, including the frontal lobe, which is responsible for maintaining sustained attention, and the parietal regions, which integrate sensory information from vision and hearing while processing spatial cognition vital for driving. Additionally, the central region of the brain collaborates with the parietal lobe to facilitate the driver's control of the vehicle's acceleration, braking, and steering, ensuring safe and smooth driving. These brain regions are essential for the successful completion of a driving task.
Interestingly, the critical brain regions can be strengthened after the temporal-channel attention (TCA), showing that TCA in SACC-CapsNet can explore the critical temporal information and crucial channels for driving fatigue detection, enhancing the discriminative ability of the EEG signals for driving fatigue detection. Additionally, the reconstructed spatial EEG covariance matrix generated by SACC-CapsNet can maintain information about the critical brain regions.

G. Performance of SACC-CapsNet Across Sessions and Subjects
We conducted cross-session driving fatigue detection to evaluate the stability of SACC-CapsNet over time and cross-subject driving fatigue detection to assess the generalization capability of SACC-CapsNet for new subjects. The validation approach for each evaluation is as follows: a) Cross-session driving fatigue detection: It should be noted that each subject has two sessions of data. We excluded subjects with an interval greater than one week between two sessions, leaving the data of 20 subjects in the subsequent experiment. In this experiment, session one was utilized as the training data, while session two served as the testing data for each subject. The average accuracy was calculated across all test subjects. b) Cross-subject driving fatigue detection: For each session, we employed leave-one-subject-out cross-validation, where the model was trained on data from 30 subjects and tested on the remaining subject. The average accuracy was calculated across all test subjects over both sessions. To demonstrate the superiority of SACC-CapsNet, we included the comparative models, as shown in Fig. 8.
The results indicate that SACC-CapsNet outperforms other methods in detecting driving fatigue, which aligns with the findings in Section III-D. However, it is expected that the detection performance of SACC-CapsNet significantly degrades over time due to the non-stationary nature of EEG signals. Additionally, the cross-subject detection performance is reduced dramatically due to inter-subject variability. Therefore, further research should be conducted to explore the combination of SACC-CapsNet with transfer learning methods [45] to enhance the accuracy of cross-session and cross-subject driving fatigue detection.

IV. CONCLUSION
This paper proposed SACC-CapsNet to explore the critical temporal-channel information, model the inter-channel relations, and automatically capture highly discriminative and robust features for driving fatigue detection. SACC-CapsNet was comprehensively evaluated using multiple metrics, validation protocols, and extensive ablation studies, demonstrating its accuracy and robustness in detecting driving fatigue from various perspectives. Additionally, the analysis of the spatial EEG covariance matrix learned from SACC-CapsNet reveals that the frontal pole exhibits the most vital connections, followed by the parietal and central regions, indicating their dominant role in driving fatigue detection. Notably, the implementation of temporal-channel attention in SACC-CapsNet further strengthens information about these critical brain regions. Moreover, the reconstructed covariance matrix generated by SACC-CapsNet effectively preserves a significant amount of information within these regions.
The proposed SACC-CapsNet model is a general framework based on multi-channel EEG, which can also be easily extended to other similar tasks, including sustained-attention driving tasks [46], motor imagery classification, and emotion recognition. However, it is important to note that SACC-CapsNet has a limitation in that it requires data preprocessing steps, such as using ICA for artifact component removal. In future research, we will explore alternative approaches, such as autoencoders, to reduce the reliance on data preprocessing steps.