Dynamic Multi-Graph Convolution-Based Channel-Weighted Transformer Feature Fusion Network for Epileptic Seizure Prediction

Electroencephalogram (EEG) based seizure prediction plays an important role in the closed-loop neuromodulation system. However, most existing seizure prediction methods based on graph convolution network only focused on constructing the static graph, ignoring multi-domain dynamic changes in deep graph structure. Moreover, the existing feature fusion strategies generally concatenated coarse-grained epileptic EEG features directly, leading to the suboptimal seizure prediction performance. To address these issues, we propose a novel multi-branch dynamic multi-graph convolution based channel-weighted transformer feature fusion network (MB-dMGC-CWTFFNet) for the patient-specific seizure prediction with the superior performance. Specifically, a multi-branch (MB) feature extractor is first applied to capture the temporal, spatial and spectral representations fromthe epileptic EEG jointly. Then, we design a point-wise dynamic multi-graph convolution network (dMGCN) to dynamically learn deep graph structures, which can effectively extract high-level features from the multi-domain graph. Finally, by integrating the local and global channel-weighted strategies with the multi-head self-attention mechanism, a channel-weighted transformer feature fusion network (CWTFFNet) is adopted to efficiently fuse the multi-domain graph features. The proposed MB-dMGC-CWTFFNet is evaluated on the public CHB-MIT EEG and a private intracranial sEEG datasets, and the experimental results demonstrate that our proposed method achieves outstanding prediction performance compared with the state-of-the-art methods, indicating an effective tool for patient-specific seizure warning. Our code will be available at: https://github.com/Rockingsnow/MB-dMGC-CWTFFNet.


I. INTRODUCTION
E PILEPSY is one of the most common brain diseases of nervous system, producing recurrent seizures and threatening the patients' life [1].Recently, more than 50 million people worldwide suffer from epilepsy, and there are approximately 30% of patients deteriorating into refractory epilepsy, despite both drug and surgical treatment [2].Fortunately, the seizure prediction based on Electroencephalography (EEG) provides an additional solution for these refractory epilepsy patients, who can give early warning for advanced neuromodulation treatments [3], so as to suppress seizures effectively.The previous studies divided the long-term recorded epileptic EEG signals into four neurophysiological periods: inter-ictal, pre-ictal, ictal and post-ictal periods [4], [5].Therefore, the core problem for the epileptic seizure prediction is how to accurately distinguish the pre-ictal period from inter-ictal period, promoting intelligent waring before seizure onset for patients and clinicians [6].
For automatic EEG seizure prediction, the primary challenge is to extract discriminative EEG features of the epileptic activity.Due to the high temporal resolution of EEG, the long short-term memory (LSTM) [7], [8] was introduced to the seizure prediction models to capture the temporal information of the epileptic EEG.In addition, to exploit the spectral representation in epileptic rhythms, the wavelet transformation [9] and the short-time Fourier transform [10] were combined with the convolution neural network (CNN), which can learn quantitative time-frequency characteristics to facilitate the classification of inter-ictal and pre-ictal periods.Moreover, Ahmet et al. [4] proposed a 3D-CNN seizure prediction framework to evaluate the spatio-temporal evolution correlation from multi-channel EEG time series.Zhang et al. [11] designed a spatial filter of common spatial pattern to extract distinguishing spatial features from epileptic EEG, which further fed into a shallow CNN to discriminate between the pre-ictal and inter-ictal states.However, these methods just obtained the coarse-grained EEG features in single or multi domains by a fixed mode, without taking full advantage of the patient-specific temporal, spectral and spatial signatures simultaneously, which may lead to the loss of essential epileptic activity information.Thus, a multi-branch feature extractor is needed to capture the multi-level fine-grained representations from the epileptic EEG in multiple domains.
Another existing issue is that the CNN framework in seizure prediction task can only learn low-dimensional spatial correlations among EEG channels, due to its regular convolution operation and the local receptive field [12].It is difficult to track the complex non-Euclidean structure in the epileptic seizures [13].To deal with this problem, the graph convolutional network (GCN) were investigated in recent studies of the seizure prediction field [14], [15].The common procedure in GCN is to define the prior adjacency matrix for constructing the graph structure among channels, which helps to convert the epileptic EEG signals to a graph representation with graph nodes and edges [16].For example, Wang et al. [17] employed phase locking value (PLV) in EEG data to construct the adjacency matrix of graph edges.The differential entropy (DE) was applied in the inference of the spatial coupling in network topology to calculate the temporal correlations of EEG and yield the graph nodes [18].Unfortunately, these GCN methods based on the information theory depended on the handcrafted features to generate EEG graphs, neglecting the dynamic changes in patient-specific graph construction.Lian et al. [14] developed a joint graph structure and representation learning network (JGRN) to predict seizures, where the graph structures can be jointly optimized with patient-specific connection weights of temporal channels.A similar study proposed a subject-independent seizure predictor by using geometric deep learning, realizing the seizure prediction from LSTM EEG graph synthesis [15].It is notable that most of these models ignored the spatial position relationship among EEG channels, and only focused on the single and shallow static EEG graph construction without the spatial position guidance, which cannot fully represent the dynamic changes of individualized channel connectivity in multiple domains.Therefore, a novel GCN is highly required to jointly characterize high-level multidomain features, and map patient-specific dynamic EEG graph representations.
Additionally, in order to integrate comprehensive feature information for the precise seizure prediction, some feature fusion strategies were designed to fuse the EEG features from different scales and domains.For example, Li et al. [19] adopted a temporal-spectral squeenze-and-excitation scheme to fuse the hierarchical multi-domain representations of epileptic EEG, which reduced the information redundancy of high-dimensional features.Gao et al. [20] combined the attention mechanism with dilated convolution to aggregate spatio-temporal multi-scale features, providing a promising solution for EEG-based seizure prediction.Although these feature fusion methods obtained a comprehensive feature, they only considered the general fusion of low-level features in Euclidean space [21].High-level EEG graph node features, embedded in non-Euclidean graph structures, urgently need a specific fusion approach to enable robust seizure prediction.
The main motivation of our study aims to break through limitations of the existing prediction methods, including coarse-grained EEG features in single domain, shallow static EEG graph construction without spatial position guidance and difficulties in high-level graph feature fusion.Thus, we propose a novel multi-branch dynamic multi-graph convolution based channel-weighted transformer feature fusion network (MB-dMGC-CWTFFNet), for the patient-specific seizure prediction.First, a multi-branch (MB) feature extractor is used to capture multi-level fine-grained features from epileptic EEG in multiple domain.Second, in order to extract multi-domain graph features, a point-wise dynamic multi-graph convolution network (dMGCN) is constructed to adaptively learn three-view dynamic graph structure with spatial position guidance.Finally, we investigate a channel-weighted transformer feature fusion network (CWTFFNet) to efficiently fuse the multi-domain graph features, which introduces the channel-weighted self-attention mechanism to map discriminative fused representations for the seizure prediction.The proposed MB-dMGC-CWTFFNet is evaluated on two kinds of epileptic datasets, i.e., CHB-MIT EEG dataset and our Xuanwu intracranial stereo-electroencephalography (sEEG) dataset, and achieves the promising performance compared with the state-of-the-art methods, which validates its outstanding capability in seizure prediction task.
In general, the main contributions of our study are summarized as follows: 1) A novel MB-dMGC-CWTFFNet is proposed to predict seizures for the individual epilepsy patient, which can efficiently fuse multi-domain graph features, yielding the highest prediction performance on both CHB-MIT and Xuanwu datasets, respectively.
2) We design a MB feature extractor, including three parallel sub-branches in temporal, spatial and frequency domains respectively, to capture the multi-level fine-grained features jointly, which offsets the inadequate representation of coarse-grained EEG features in traditional feature extractors.
3) A dMGCN is constructed by point-wise dynamic graph neural network, which can learn dynamic changes of three-view graph structures with spatial position guidance, and extract deep multi-domain graph features, and thus overcomes insufficient expression of spatial connectivity in shallow static EEG graph.
4) A CWTFFNet is developed by introducing both the local and the global channel-weighted self-attention into the transformer network.The local graph edge weights are complementary to the global channel position information, which can implement efficient fusion of high-level graph features against current feature fusion strategies.

II. METHODOLOGY
The seizure prediction framework of our proposed MB-dMGC-CWTFFNet is displayed in Fig. 1.The overall architecture mainly consists of the multi-branch feature extractor, the point-wise dynamic multi-graph convolution network and the channel-weighted transformer feature fusion network, summarized as follows: 1) The MB feature extractor is primarily designed to extract the multi-domain temporalspatial-spectral features from EEG signals.
2) The dMGCN is further employed to transform the temporal-spatial-spectral features into high-level graph representations from temporal, spatial and spectral views.3) The CWTFFNet is adopted to obtain the fused feature maps, and the fully connected layers are utilized to generate the recognition results ultimately.The well-trained MB-dMGC-CWTFFNet is then transformed into a practical seizure warning system by a post-processing strategy.Details of each step are given in following subsection.

A. Multi-Branch Feature Extractor
The epileptic EEG signals are defined as E = (x i , y i )|i = 1, 2, . . ., N , where x i ∈ R C×S represents the i-th EEG trial with C channels and S sampling points.N is the total number of EEG trials.y i is a binarized label of pre-ictal or inter-ictal state corresponding to x i .
Considering the individualized differences of epileptic activities in both time domain, frequency domain and spatial domain, we firstly construct the MB feature extractor to capture the temporal-spatial-spectral representations from epileptic EEG signals.In Fig. 1, the MB feature extractor includes three sub-branches: the multi-scale temporal-conv branch, the multi-band spectral-conv branch and the multi-channel spatial-encoding branch.
1) Multi-Scale Temporal-Conv Branch: Epileptic seizure recordings involved the critical electrophysiological fluctuation from inter-ictal period to pre-ictal period [22].In order to capture the comprehensive temporal information of EEG with its higher time resolution in time domain, a multiscale temporal-conv branch is first designed with nparallel temporal convolution (TConv) layers.Thus, we can gain the multi-scale temporal features with different sizes from TConv-1 to TConv-n, denoted as , where T k is the output scale of the temporal feature from the k-th TConv.Additionally, the batch normalization and exponential linear unit (ELU) are also applied in the each TConv of the multi-scale temporal-conv branch to accelerate the training and convergence of the proposed model.Accordingly, these multi-scale temporal features are concatenated to generate the overall feature map in time domain: 2) Multi-Band Spectral-Conv Branch: Previous studies have proven that the epileptic activities may be of different frequencies for epilepsy patients [23].Thus, according to the clinical five frequency sub-bands: δ band (0-4Hz), θ band (4)-8Hz), α band (8)-13Hz), β band (13)-30Hz), γ band (30-50Hz) [24], the multi-band spectral-conv branch is adopted to contain the hierarchical wavelet convolutions (WaveConv) based on Daubechies order-4 (Db4) wavelet [25].The wavelet decomposition can be accomplished on the EEG trials due to its high correlation coefficients with the epileptic signal [26] to obtain the wavelet spectral features in the five sub-bands.The hierarchical WaveConv layers perform successive spectral analysis by means of L-level iteration, where L = log 2 ( f s ) − 3, determined by the EEG sampling rate f s , and ⌊•⌋ represents the rounding-down operation [27].Then, the frequency boundaries of the l-th WaveConv are (0, f s /2 (l+1) ) and ( f s /2 (l+1) , f s /2 l ), respectively, where l = 1, 2,• • • , L. After inputting the EEG trial x i ∈ R C×S into the multi-band spectral-conv branch, we can obtain the multi-band wavelet spectral features: to five standard physiological frequency sub-bands, where H = S/2 l is the output dimension of the wavelet spectral features generated from the l-th WaveConv.Additionally, due to the similar of time-frequency analysis with the discrete wavelet transform, the WaveConv operators have no learnable parameters in the processing of the feature extraction, which weights are fixed and given by Db4 wavelet filter.Then, the five-band wavelet spectral features are concatenated into the integral spectral feature map in frequency domain: 3) Multi-Channel Spatial-Encoding Branch: Apart from the multi-scale temporal-conv branch and multi-band spectralconv branch, we also propose a multi-channel spatial-encoding branch to excavate the representations of channel mapping in spatial domain.Specially, the multi-channel EEG trials are transposed to the channel-wise slices, which are imported into the channel position encoder and spatial feature encoder to complete the channel correlation construction and the spatial feature extraction, respectively.For the channel position encoder, a distance set U is foremost established by U = {u i j |i, j ∈ [1, C], i ̸ = j}, where u i j represents the Euclidean distance of the i-th channel and the j-th channel, C is the total number of channels.we can get the u i j from the international standard electrode system [28].Then the initialized channel adjacency matrix A ∈ R C×C is generated by the following position embedding method: where M(•) is the mean operation, a i j is the element of the i-th row and the j-th column of adjacency matrix A ∈ R C×C .Therefore, the channel adjacency matrix Acontains the global position information of the multi-channel relationship, which will be used to construct the dynamic graph in the following point-wise dMGCN.Additionally, we adopt the spatial feature encoder based on the channel-wise spatial convolution [29], [30] to extract the multi-channel spatial characteristics, which are ultimately concatenated and reshaped as F S ∈ R C×D S , where D S is the output dimension of integrated spatial feature map.

B. Point-Wise Dynamic Multi-Graph Convolution Network
To further learn deep dynamic connectivity of different brain regions for the individual epilepsy patient, in this subsection, a novel point-wise dMGCN is proposed to extract multi-domain graph features.Three synchronized dynamic graph convolution networks are involved by temporal, spatial and spectral views.Three views constitute the point-wise dMGCN in Fig. 2, which can explore the deep channel relationship of the temporal feature map F T , the spatial feature map F S and the wavelet spectral feature map F R , respectively.For each graph convolution view, the initialized adjacency matrix A ∈ R C×C , depicting original distance between any two channels, has been calculated by the channel position encoder of the MB feature extractor.To further guide the dynamic evolution process of the channel relationship from three kinds of views, a self-gating strategy is employed in initialized adjacency matrix A as follows: where Ã1 , Ã2 , Ã3 ∈R (C×C)×1 are reshaped from A ∈ R C×C , W 12 , W 22 , W 32 ∈R ((C×C)/r )×(C×C) and W 11 , W 21 , W 31 ∈R (C×C)×((C×C)/r ) are weight matrixes of fullyconnected layers, r is the reduction ratio, •δ() and •σ () are the ELU activation function and the rectified linear unit (ReLU).Hence the three dynamic adjacency matrixes A T , A S , A R ∈R C×C corresponding to temporal, spatial and spectral graph convolution nets are acquired by reshaping ÃT , ÃR , ÃS ∈R (C×C)×1 into R (C×C) .After constructing the dynamic connectivity of epileptic activities from three views, the operations of the dynamic graph convolution are performed on the temporal feature map F T ∈ R C×D T , the spatial feature map F S ∈ R C×D S and the wavelet spectral feature map F R ∈ R C×D R , respectively, which are formulated by: where the dynamic graph features corresponding to F S , F T , F R with the hidden non-Euclidean topology in epileptic activities, R are the degree matrixes corresponding to A T , A S , A R respective, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.11 , 12 ∈ R D T ×D T , 21 , 22 ∈R D S ×D S , 32 , 32 ∈R D R ×D R represent the weight matrixes of convolution kernels in the point-wise convolution unit [31].Therefore, we obtain the dynamic multi-domain graph features G T , G S , G R with their corresponding dynamic adjacency matrix A T , A S , A R , which will be fed into the CWTFFNet to conduct the final feature fusion in the next subsection.

C. Channel-Weighted Transformer Feature Fusion Network
To further fuse the high-level graph features G T , G S , G R , the CWTFFNet is proposed by combining the dynamic adjacency matrix A T , A S , A R with multi-head self-attention mechanism.In Fig. 3, the CWTFFNet can be divided into a local channel-weighted multi-head self-attention (Local CW-MHSA), a global channel-weighted feature fusion block (Global CW-FFB) and the multi-layer perception (MLP).
The Local CW-MHSA consists of three heads: the A T -weighted self-attention unit (SAU), the A S -weighted SAU and the A R -weighted SAU.For each self-attention head, three kinds of weight matrixes, denoted as are initially introduced to encode the input graph features, where d K and d V both represent the hyperparameters.So, the query Q,the key Kand the value Vare calculated via: where the G ∈ {G T , G S , G R } indicates three kinds of graph features.We introduce the local channel-weighted strategy by applying the dynamic adjacency matrixes in multi-head selfattention mechanism.Then, the local features Z local ∈ R C×d K from Local CW-MHSA is obtained by: where Z T , Z S , Z R are the outputs from the three self-attention heads respectively, and •Concat () is the concatenation function.To further capture the global information, we employ the Global CW-FFB and MLP in local features Z local according to the following equations: where where p i is the conditional probability of the i-th EEG trial outputted by the proposed MB-dMGC-CWTFFNet, l j is the class from the label set, •ϕ() represents the indicator function, Nis the total number of samples, M =2 is the number of classes.λ ∥θ∥ belongs to the trade-off regularization term of Eq. ( 17), and aims to alleviate the overfitting problem during the model training, where λ is the regularization parameter and the θ denotes the updatable parameters of the model.As a result, a personalized well-trained model of the proposed MB-dMGC-CWTFFNet is generated, and will be performed the individual seizure prediction by the following post-processing [32].
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

D. Post-Processing Strategy
Eventually, the well-trained MB-dMGC-CWTFFNet is then transformed into a practical seizure warning system by a post-processing strategy [33].Specifically, after inputting the consecutive EEG signals into the well-trained MB-dMGC-CWTFFNet, the probability series P(i) belonging to pre-ictal class from i-th epoch is generated.Then we employ a moving average filter on P(i) to reduce the oscillation and obtain the smoothed probability series P s (i) over time [32].The lengths of the moving average filter are configured to 15s and 25s for CHB-MIT and our Xuanwu dataset respectively, which will be discussed by the experimental results in Section IV-B.

A. Dataset Description
The performance of the proposed MB-dMGC-CWTFFNet is evaluated on two epileptic datasets, wich is given as follows: 1) CHB-MIT Scalp EEG Dataset [34]: The CHB-MIT dataset contains the scalp EEG signals from 23 patients, which were recorded with 18 common electrodes and sampled at 256Hz in the Children's Hospital Boston.In this study, there were at least two seizures and three-hour inter-ictal recordings from each patients, who were selected for the patient-specific model evaluation of seizure prediction [4].In addition, the neural recordings within two hours after a seizure are removed to exclude the effect of post-ictal period [32].Specially, if several seizures cluster within two hours, only the first seizure prediction is considered as an effective evaluation, because a successful warning depends on whether the model can predict the leading seizure [35].
2) Xuanwu Intracranial sEEG Dataset: The Xuanwu dataset is collected by the Xuanwu Hospital of Capital Medical University, Beijing, China, which consists of sEEG recordings on the intracranial depth electrode for 5 focal epilepsy patients, which sampled at 256Hz with 15 channels.From Table I, there are total 16 seizures, and the recording duration of sEEG from these patients is 42 hours.The labels of the inter-ictal, pre-ictal and ictal states were marked by the professional clinicians.This study was approved by the Ethics Committee of Xuanwu Hospital, Capital Medical University (LYS2018041) in Beijing and complied with the ethical standards of the Declaration of Helsinki.Informed consent was obtained from all patients.

B. Experimental Settings and Evaluation Metrics
In this study, based on the recent research [4], [11], the EEG signals from CHB-MIT and Xuanwu datasets are both cropped into 5-second clips before fed into the proposed MB-dMGC-CWTFFNet.Additionally, the pre-ictal period was popularly defined by 15 minutes before seizure onset in the latest methods [4], [32].Thus, we adopt the identical setting to the pre-ictal period, and the inter-ictal period is defined at least 2 hours away prior to seizure onset and after seizure ending [10].
To conduct a comprehensive performance evaluation of the proposed MB-dMGC-CWTFFNet, the patient-specific leaveone-out cross-validation (LOOCV) [20] is employed in this study.Since the inter-ictal period is much larger than the pre-ictal periodin the model training stage, the inter-ictal clips are randomly down-sampled to the same number of the pre-ictal clips [8].Then, assuming that there are total N i seizures for the i-th patient, in each leave-one-out loop, N i − 1 seizures are utilized for training while the left one is for testing, during the training stage, the cross-validation is use to divide training set and validation set.It is repeated with N i loops until the proposed model completes the prediction evaluation of all seizures for the i-th patient.Since the number of seizures and the recording duration are both different for each patient in two datasets, for each leave-one-out loop, the number of training data samples varies from 3028 to 18241, the validation data size varies from 572 to 3543, the testing data size varies from 3131 to 7508, and the total data size varies from 6731 to 29292.The proposed MB-dMGC-CWTFFNet is evaluated via four metrics, including area under curve (AUC), sensitivity (S n ), false prediction rate (FPR/h) and the p-value.AUC mainly reflects the classification performance for the inter-ictal and the pre-ictal states.S n denotes the ratio of successfully predicted seizures to the total number of seizures.FPR/h indicates the number of false alarms per hour, and the p-value represents the significance of an improvement over chance-level, which is used to evaluate statistically significance whether the seizure warning system is better than a random predictor [4].

C. Overall Performance
In order to illustrate the patient-specific prediction efficiency of the proposed MB-dMGC-CWTFFNet, we compare our proposed model with the following state-of-the-art methods in the same chance-level, which are tested on two datasets.
1) DCNN-Bi-LSTM [8]: This is a typical deep learning method by combining the deep convolutional network with a bidirectional long short-term memory, extracting the spatial and temporal features of epileptic EEG signals respectively, which were used for the seizure prediction.2) CE-stSENet [19]: This method used a temporal-spectral squeenze-and-excitation network to capture hierarchical multi-domain representations, which introduced the attention mechanism into the epileptic seizure detection task and improved the recognition performance.3) TS-MS-DCNN [20]: This advanced model encoded the multi-scale EEG features by designing temporal and spatial multi-scale stages, and a dilated convolution block was constructed to further expand the feature receptive and achieving the EEG-based seizure prediction.

TABLE II THE PATIENT-SPECIFIC OVERALL COMPARISON OF PERFORMANCE ON CHB-MIT DATASET TABLE III THE OVERALL OF PERFORMANCE ON XUANWU DATASET
The experimental of the patient-specific comparison on public CHB-MIT and our Xuanwu datasets are listed in Table II and Table III respectively.From Table II, we can observe that the DCNN-Bi-LSTM, CE-stSENet and TS-MS-DCNN gain the average AUC of 0.865, 0.857 and 0.890 respectively on CHB-MIT dataset, while our proposed MB-dMGC-CWTFFNet reaches the highest average AUC of 0.935.Especially for Patient 1, 8, 13 and 23, which the AUC are all greater than 0.985, indicating an excellent performance of our method in distinguishing between the inter-ictal state and the pre-ictal state.In the seizure prediction scenario, the average sensitivity of our proposed model achieves an ideal 97.8% as well, which outperforms three baseline methods with 7.1%, 11.8% and 6.3% respectively.The distinct advancement on S n demonstrates that our proposed model, which is transformed into the seizure warning system, performed a more successful seizure warning for an individual patient.In addition, our proposed model yields the lowest average FPR/h of 0.059, which is at least 58.7% improvement against other methods.Meanwhile, the p-value of our seizure warning system is less than or equal to 0.001 for all patients, implying the improvement-over-chance of our seizure predictor is statistically significant with 99.9% confidence interval.It indicates that our proposed MB-dMGC-CWTFFNet has the significantly patient-specific capability for the epileptic seizure prediction.
Additionally, to further validate the effectiveness of our proposed method, Table III lists the prediction results for five focal epilepsy patients on Xuanwu dataset.It is obvious that our proposed model achieves more excellent performance on AUC and S n with average 0.984 and 100% respectively, which are at least 5.1% and 10.0% higher than that of the state-of-the-art models.The average FPR/h of our method is 0.079, which is lower than other methods.These encouraging experimental results demonstrate the remarkable performance ( p <0.05) of our MB-dMGC-CWTFFNet framework in the subject-independent intracranial seizure prediction task, which makes it possible to predict seizure by implanting intracranial deep electrode, and it enables more convenient treatment for refractory epilepsy patients [36].

A. Ablation Studies
To prove the innovation of each component of our proposed MB-dMGC-CWTFFNet, the ablation studies are conducted on both CHB-MIT and Xuanwu datasets.In this subsection, we discuss the efficacy of each innovation by comparing the proposed method with and without this component, which contributes to justifying the positive influence.The overall experimental result of ablation studies is presented Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply. in Table IV, and the impacts of MB feature extractor, pointwise dMGCN and CWTFFNet are demonstrated respectively as follows:

TABLE IV ABLATION STUDIES ON TWO DATASETS
1) Impact of MB Feature Extractor: In order to give a comprehensive assessment for the proposed MB feature extractor, we compare our MB-dMGC-CWTFFNet with three simplified sub-models: a) the model without temporal-conv; b) the model without spatial-encoding; c) the model without spectralconv.From Table IV and Fig. 4, firstly, when using the temporal-conv to extract the multi-scale temporal features on two datasets, the AUC of our MB-dMGC-CWTFFNet increased by 2.6% and 3.1% respectively compared with the model without the temporal-conv.The S n also get the improvements of 3.0% on CHB-MIT dataset, 15.0% on our Xuanwu dataset.Additionally, the FPR/h declines by 79.9% and 45.9% on two datasets respectively, which illustrates the availability of the temporal-conv in capturing multi-scale temporal evolution, and indicates the effectiveness of MB feature extractor in extracting fine-grained temporal features.
Meanwhile, the spatial-encoding also plays an important role in the feature extraction of multi-channel spatial features.For instance, compared to the model without the spatialencoding, the AUC of the proposed model increase from 0.907 and 0.948 to 0.935 and 0.984 on two datasets respectively, and the S n are improved from 94.8% and 85% into 97.8% and 100%.The FPR/h decrease from 0.337 and 0.243 to 0.059 and 0.079.It verifies that the spatial-encoding branch enables exact spatial expression with cortical multi-channel representations, which contributes to the seizure prediction with distinct improvements of performance metrics.
In addition to the above two branches proposed in MB feature extractor, the effect of the spectral-conv is further discussed.From Table IV, we can find that the proposed method with spectral-conv shows a better performance.For the CHB-MIT dataset, its evaluation metrics of AUC and S n are 4.8% and 4.4% higher than that of the model without spectral-conv, and the FPR/h declines by 88.5%.Similarly, when using the spectral-conv on the Xuanwu dataset, our MB-dMGC-CWTFFNet achieves the improvement of 2.3% AUC and 5.0% S n over the model without spectral-conv, whose FPR/h are reduced by 37.8% accordingly.These evaluation results also prove that the designed spectral-conv can extract comprehensive spectrum characteristics in five clinical physiological rhythms, which facilitates the construction of the patient-specific MB feature extractor by combining with temporal-conv and spatial-encoding branches.
Moreover, to further validate the superiority of the proposed MB feature extractor intuitively, the t-SNE is applied to visualize the temporal-spatial-spectral features, which were extracted by the models with and without MB feature extractor.The t-SNE visualization in 2D embedding space of inter-ictal and pre-ictal features on two datasets is shown in Fig. 5.We can see that the binary-class feature distributions, learned by MB feature extractor, presents a better discrimination than the model without MB feature on both CHB-MIT and Xuanwu datasets.Especially for the models without spatialencoding, some inter-ictal and pre-ictal features are confused together.In contrast, the proposed model using MB feature extractor obtain more discriminative features, embodied in visible inter-class distance and dense intra-class distribution on both two datasets.These phenomena also explain that the best seizure prediction performance can be produced by combining the temporal-conv, the spatial-encoding and the spectral-conv branches simultaneously, which fully illustrates the innovation of MB feature extractor in extracting the multi-level finegrained features jointly.
2) Influence of Point-Wise dMGCN: To judge the contribution of the proposed point-wise dMGCN, we perform an ablation experiments for our point-wise dMGCN to investigate its influence in the patient-specific seizure prediction.Fig. 6 shows the comparison of AUC between the proposed models with and without point-wise dMGCN for each patient.For CHB-MIT dataset, we can see that the average AUC of our MB-dMGC-CWTFFNet is 7.3% higher than that of the model without point-wise dMGCN.Especially, a maximum AUC increase of 0.18 (about 22.5% improvement) occurs on Patient 13.For Xuanwu dataset, the AUC of our model increased by about 7.5% compared to the model without   IV, after employing point-wise dMGCN, the S n of our model outperform that of the ablation model with 8.1% and 16.7% on two datasets respectively, and the FPR/h is decreased from 0.326 into 0.059 on CHB-MIT, from 0.651 into 0.079 on Xuanwu dataset.These enhancements give substantial evidences that point-wise dMGCN can better learn the three-view graph structures with spatial position guidance, and extract deep multi-domain graph features, which promotes the overall performance in seizure prediction warning.
3) Efficacy of CWTFFNet: In order to fuse the dynamic multi-domain graph features, the CWTFFNet is adopted integrate the local and global representation based on the channel-weighted self-attention mechanism.Therefore, we further compare the efficacy between our MB-dMGC-CWTFFNet and the model without CWTFFNet.The results of the ablation experiment on two datasets are presented in Fig.To further evaluate the ability of the proposed CWTFFNet contributing to the performance of the seizure prediction, we conduct the comparison of the prediction time with and without CWTFFNet on CHB-MIT and Xuanwu respectively, and the results are unfolded in Fig. 8.The proposed models with and without CWTFFNet both successfully implement the seizure prediction in pre-ictal periods.However, for the identical seizure from CHB-MIT dataset, the model without CWTFFNet just achieves the seizure prediction with 5 minutes prior to the seizure onset, while our proposed model using CWTFFNet obtain a 9-minute advance of prediction time.For Xuanwu dataset, the prediction time was improved from 13 minutes to 15 minutes before the seizure onset, and these improvements in seizure prediction time strongly embody the innovative contribution of CWTFFNet.Specifically, the channel-weighted transformer in our CWTFFNet can reinforce the learning of the model by multi-channel-weighted self-attention mechanism.Accordingly, the fused features, containing local multi-graph structures and global channel information, are more conducive to the seizure prediction.

B. Parameters Analysis of Post-Processing
In this subsection, we mainly analyze the influence of two hyperparameters for our training model, the filter length and the threshold ω, on the seizure warning system transformed by our MB-dMGC-CWTFFNet.In the post-processing, the moving average filter can smooth the probability outputs of our model by filtering the outliers, resulting in practical seizure prediction results.Hence, the filter length in the moving average filter is set from 5 to 60 with a step of 5 (unit: second), and the corresponding variations of S n and FPR/h on two datasets are shown in Fig. 9(a) respectively.Interestingly, in both two datasets, the larger filter length leads to the unsatisfactory S n , while the smaller filter length causes the poor FPR/h.The main reason is that a large filter length may result in over-smoothed prediction results, and thus some short-duration warnings are probably missed.However, a small filter length may retain more predicted outliers, which greatly Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.increases the probability of false alarm [32].Consequently, to maintain the trade-off between S n and FPR/h, the filter lengths are configured to 15 for CHB-MIT dataset and 25 for Xuanwu dataset.

TABLE V EXPERIMENTAL SETTINGS AND PERFORMANCE COMPARISON OF THE STATE-OF-THE-ART METHODS ON CHB-MIT
Since the seizure warning depends on whether the predicted probability exceeds threshold ω, we discuss this hyperparameter to evaluate its sensitivity on the proposed model.The threshold ω is varied in the range from 0.1 to 0.9 with a stride of 0.1, and the performance trade-off between S n and FPR/h is also displayed in Fig. 9(b).As can be observed, the variation trends of two evaluation metrics along with the threshold ω are similar to that with the filter length.The best trade-off results between S n and FPR/h both appear in the 0.6 threshold on two datasets.Thus, to achieve optimal seizure prediction after the post-processing, we set ω to 0.6 as the final fixed threshold for CHB-MIT and Xuanwu datasets, which is consistent with existing studies [9], [32].

C. Performance Comparison of the State-of-the Art Methods
The performance comparison of the state-of-the-art seizure prediction methods on CHB-MIT dataset is summarized in Table V.In order to discuss the advantages of our proposed model, we conduct an objective comparative analysis among these methods.For example, Truong et al. [10] and Yang et al. [35] both employed the short-time Fourier transform (STFT) in the CNN of seizure prediction frameworks, which were tested on 13 patients and achieved the sensitivities of 81.2% and 89.25% respectively, lower than our MB-dMGC-CWTFFNet.This is mainly because our proposed MB feature extractor can extract multi-level fine-grained features compared to traditional time-frequency feature extraction methods.Compared with two deep learning methods using spectral power [4] and common spatial pattern (CSP) [11] respectively, our proposed method applies the point-wise dMGCN learns three-view graph structures and captures deep multi-domain graph features, so it yields 10.8%, 5.81% higher in S n and 0.127, 0.061 lower in FPR/h.Unlike the study [20] that fused the multi-scale temporal-spatial features by attention mechanism based dilated CNN, our MB-dMGC-CWTFFNet introduces the local and global channel-weighted strategy into the multi-head self-attention units, which is beneficial to efficient feature fusion for complex graph structures and outperforms the TS-MS-DCNN with 4.51% S n .Although some advanced methods [9], [15] gained the suboptimal performance in seizure prediction, their validation scheme using 10-Fold CV shuffled original EEG signals and destroyed the continuity of epileptic activity over time, and is not conducive to real-time seizure warning compared with our adopted LOOCV scheme.Additionally, compared with the model proposed by Liang et al. [37], our MB-dMGC-CWTFFNet achieves 9.01% higher in S n and 0.123 lower in FPR/h.Because our proposed method considers multi-domain variable information and constructs the multi-graph framework, it offsets the lack of partial domain information from the feature alignment strategy in SSDA-SPM.In summary, compared to most of existing studies, the main differences and advantages of our MB-dMGC-CWTFFNet include that it can dynamically learn changes in multi-graph topologies with spatial position guidance.Meanwhile, it efficiently fuses multi-domain graph features by using channel-weighted multi-head self-attention mechanism.

D. Limitations and Future Directions
Although our proposed prediction framework achieves satisfactory seizure prediction performance, two limitations still exist in our study.First, our MB-dMGC-CWTFFNet can realize the end-to-end seizure warning without complicated EEG pre-processing, but the artifacts in epileptiform discharges and potential bad channels may interfere with the predictor and cause some false positives in practical warning scenario.Therefore, we will devote to exploring the adaptive channel selection [38], [39] and unsupervised artifact removal algorithms [40], and further eliminating the redundant information in raw epileptic signals.Second, our proposed method conducts a patient-specific seizure prediction by training with the same patient's data, while it is difficult to complete the model fine-tuning across patients.Thus, we will combine the domain-adversarial transfer learning strategies [41], [42] with our seizure prediction framework in the future work, which aims to handle the drifting distribution between target domain and source domain, and contributes to the cross-patient seizure prediction.

V. CONCLUSION
In this study, we propose a novel EEG-based MB-dMGC-CWTFFNet framework for patient-specific seizure prediction.The MB feature extractor is adopted to effectively capture the multi-level fine-grained representations in multiple domains.The designed point-wise dMGCN is further employed to dynamically learn the deep graph structures with spatial position guidance, which contributes to extracting the multi-domain graph features from temporal, spatial and spectral views.Finally, the CWTFFNet utilizes the local and global channel-weight strategy to facilitate the efficient fusion of high-level graph features.Furthermore, we conduct the comparative experiments on two epileptic datasets, and the results show our proposed MB-dMGC-CWTFFNet obtains a better evaluation metrics, whose AUC, S n , FPR/h achieve 0.935 and 0.984, 97.8% and 100.0%, 0.059 and 0.079 on CHB-MIT and Xuanwu datasets respectively, outperforming the state-of-theart methods.These findings prove the outstanding performance of our proposed MB-dMGC-CWTFFNet in patient-specific seizure prediction, and indicate its potential application prospect in neurostimulation treatment of refractory epilepsy patients.
is the initialized adjacency matrix from the channel position encoder, •G A P() represents the global average pooling, •L N () means the layer normalization, •ReLU () is the rectified linear unit, F 1 f c and F 2 f c denote the FC layers, and •F M() is a feedforward module (including two feedforward layers and an ELU activation).Therefore, we acquire the fused features Z f used ∈ R C×d K through the constructed CWTFFNet.Finally, two fully-connected layers are used to conduct the decoding for the fused features.They are flattened into a 1-dimensional tensor to feed into the fully connected layers, then the classification probabilities of the pre-ictal and the inter-ictal states are estimated by the Softmax function, and the index corresponding to the maximum of probabilities represents the final result of seizure prediction.Moreover, a crossentropy loss functionis employed for the patient-specific model training of proposed MB-dMGC-CWTFFNet, the cross-entropy loss L C E between the prediction result and the label is minimized by:

Fig. 4 .
Fig. 4. Performance comparison of AUC between the models with and without MB feature extractor on CHB-MIT and Xuanwu datasets.

Fig. 5 .
Fig. 5.The t-SNE visualization in 2D embedding space of inter-ictal and pre-ictal features by comparing the models with and without MB feature extractor on CHB-MIT and Xuanwu datasets.

Fig. 6 .
Fig. 6.Performance comparison of AUC between the proposed models with and without point-wise dMGCN on (a) CHB-MIT dataset and (b) Xuanwu dataset.

Fig. 7 .
Fig. 7. Performance comparison of AUC between the proposed model with and without CWTFFNet on (a) CHB-MIT dataset and (b) Xuanwu dataset.

7 .
It can be noted that the CWTFFNet increases the average AUCs from 0.880 and 0.936 to 0.935 and 0.984 on two datasets, respectively.The standard deviations achieve 0.013 and 0.032 lower than that of the model without CWTFFNet, which indicate the better generalization performance of our proposed CWTFFNet across multiple patients.Especially for Patient 7 from CHB-MIT and Patient 3 from Xuanwu, their AUCs achieve greater improvements of 21.0% and 13.7% respectively.In addition, compared to the model without CWTFFNet in Table IV, the proposed model gains higher S n of 7.5% on CHB-MIT and 13.3% on Xuanwu, FPR/h values get decline of 0.135 and 0.227 after utilizing CWTFFNet on two datasets.It proves the advantage of incorporating the

Fig. 8 .
Fig. 8. Performance comparison of the seizure prediction time with and without CWTFFNet from (a) CHB-MIT and (b) Xuanwu dataset.

Fig.
Fig. Performance comparison of S n and FPR/h with different postprocessing parameters.(a) filter length; (b) threshold.