Epileptic Seizure Detection Based on Path Signature and Bi-LSTM Network With Attention Mechanism

Automatic seizure detection using electroen-cephalogram (EEG) can significantly expedite the diagnosis of epilepsy, thereby facilitating prompt treatment and reducing the risk of future seizures and associated complications. While most existing EEG-based epilepsy detection studies employ deep learning models, they often ignore the chronological relationships between different EEG channels. To tackle this limitation, a novel automatic epilepsy detection method is proposed, which leverages path signature and Bidirectional Long Short-Term Memory (Bi-LSTM) neural network with an attention mechanism. The path signature algorithm is used to extract discriminative features for capturing the dynamic dependencies between different channels of EEG, while Bi-LSTM with attention further analyzes the inherent temporal dependencies hidden in EEG signal features. Our method is evaluated on two public EEG databases with different sizes (CHB-MIT and TUEP) and a private database from a local hospital. Two experimental settings are used, i.e., five-fold cross-validation and leave-one-out cross-validation. Experimental results show that our method achieves 99.09%, 95.60%, and 99.87% average accuracies on CHB-MIT, TUEP with 250Hz, and TUEP with 256Hz, respectively. On the private dataset, our method also achieves 99.40% average accuracy, which outperforms other methods. Furthermore, our method exhibits robustness in patients, as demonstrated by the evaluation results of cross-patient experiments.


I. INTRODUCTION
E PILEPSY is a chronic neurological disorder character- ized by recurrent and uncontrollable seizures across all ages.Seizures can cause unfettered convulsive movements and momentary loss of consciousness, which are extremely harmful to the mental and spiritual health of the patient.Most patients also suffer from many other unpredictable comorbidities of epilepsy, such as memory loss, depression, and other psychiatric disorders [1].Therefore, accurate and timely epilepsy detection is crucial for patients to facilitate timely medication and reduce the risk of future epilepsyrelated complications.Electroencephalography (EEG) captures complex dynamic brain responses and can be used to identify the type of epilepsy, contribute to the diagnosis of epilepsy syndromes, and help assess the risk of epilepsy recurrence [2].However, epilepsy detection is mainly completed by specialists in neurology using the manual interpretation of EEG recordings.Since the visual detection of a large number of EEG signals is time-consuming and inefficient, placing a heavy burden on medical professionals.To improve the quality of life of epileptic patients and reduce the workload of healthcare workers, it is necessary to develop a reliable automatic seizure detection system.
The identification of epileptic and non-epileptic EEG signals is a classification problem.It involves extracting discriminative features from the EEG signal and subsequently classifying them.Therefore, selecting the appropriate features from the raw data is essential.Automatic epilepsy detection methods can usually be divided into two categories according to the generation of signal features, namely, the method based on manual feature selection and the method based on deep learning.
Traditional methods based on manual feature selection from EEG signals mainly use one or combinations of time-domain, frequency-domain, time-frequency-domain, and nonlinear features [3].Bhattacharyya et al. successively proposed two methods, tunable-Q wavelet transform-based multiscale entropy measure [4] and a multichannel extension using the empirical wavelet transform (EWT) [5], to extract time-frequency and nonlinear representations on different EEG components, which can be used to achieve effective detection of epileptic seizures on different datasets.Makaram et al. [6] extracted multiscale entropy for different signal frequency bands after empirical modal decomposition as EEG signal features for the automatic detection of epileptic seizure patterns.Furthermore, a study by [7] proposed the utilization of O-splines to implement Taylor-Fourier filter banks to separate epileptic signals into rhythms.The energy characteristics of different rhythms were combined, and an LS-SVM classifier was employed for evaluation.Recently, to address the limitations of EMD and EWT adaptive methods, Anuragi et al. [8], [9] proposed an empirical wavelet transform method based on Fourier-Bessel series expansion (FBSE-EWT) to extract entropy-based features from these rhythms and use an ensemble learning classifier for automatic classification of seizure EEG signals.This novel FBSE-EWT-based method is suitable for non-stationary signal analysis and the incorporation of entropy facilitates the capture of non-linear features, making Fourier Bessel-based entropy features highly suitable for EEG signal detection.In addition, more time-frequency domain techniques for analyzing EEG signals are well summarized in [10].However, most of these methods require decomposing EEG signals into different frequency subbands and then further extracting features, which increases memory requirements and processing overhead.In addition, these methods conduct feature extraction on a per-channel basis while ignoring the time dependency between channels.
The method based on deep learning has shown great potential in epilepsy research because of its end-to-end framework and the ability to extract deeper-level intrinsic features for classification.Zhou et al. [11] used a convolutional neural network (CNN) instead of manual feature extraction methods to extract features in the raw EEG signal to differentiate between seizure periods, pre-ictal, and interictal periods for seizure detection.Xin et al. [12] used multiscale wavelet analysis to decompose the input electroencephalogram and get the components of different frequency bands.These decomposed multi-scale electroencephalograms are then fed into CNN with an attention mechanism for further feature extraction and classification, with high accuracy in the three-class classification problem of epilepsy.However, most current seizure detection methods rely mainly on CNN to detect seizures using spatial information from the EEG signal, which lacks consideration of the temporal relationship of EEG signals.Therefore, there may be some limitations in using CNN to isolate and extract temporal features of brain activity [13].To make better use of the temporal information of EEG signals, the long short-term memory (LSTM) model was used to detect epilepsy.Li et al. [14] proposed a hybrid CNN-LSTM architecture that uses a fully convolutional network with three convolutional blocks to obtain robust features associated with seizures from raw EEG data and then explores the temporal dependence inherent in the EEG signal using Nested Long Short-Term Memory (NLSTM) layers.To capture the variable process of seizure dynamics, Zhang et al. [15] proposed a seizure detection method based on time-frequency analysis and bi-directional gated recurrent unit (Bi-GRU) neural network to analyze EEG information in both forward and reverse temporal directions, thus improving the detection performance.Nevertheless, neither the manually selected features based on the time-frequency domain, nor nonlinearity feature extraction based on deep learning methods It is noteworthy that path signature (PS) aims to extract enough information to represent a finite-length path.Since PS can be used as an effective feature extractor in various tasks, it has been widely applied in diverse fields such as financial data characterization [16], handwritten character recognition [17], and human pose estimation [18].To provide a more lucid exposition of the path signature's capability, we endeavor to explicate it through visualization effects.Since the input feature dimensions are high-dimensional, it is difficult to visualize the clustering results directly with the original feature pairs, while t-SNE (t-distributed stochastic neighbor embedding) provides an effective nonlinear dimensionality reduction algorithm that can be visualized in a low-dimensional space to show its clustering results.Fig. 1(a) is the visualization of the original EEG after the t-SNE algorithm for dimensionality reduction, it can be seen that the epileptic and normal EEG features are mixed and difficult to distinguish.Fig. 1(b) is the t-SNE visualization of the signal after feature extraction using the path signature algorithm, it can be found that it has well reduced the mixture of the two types of features.
The features extracted through path signatures can represent correlation and temporal relation using a finite-length path of different channels, making them more effective in detecting epileptic seizures compared to the time-frequency domain and nonlinear features [4], [5], [6], [7], [8], [9], [10].The effect can be seen in previous visualization experiments.Additionally, path signatures exhibit high computational efficiency, which will be further explored in the next experiment of section III-B.Based on these considerations, this paper proposes the utilization of path signatures as an effective feature extraction method to describe the dynamic dependence of longitudinal EEG channel structure.Furthermore, the features will be input into Bi-LSTM with an attention mechanism to develop an effective epilepsy detection framework.In this work, we take band-pass filtered EEG signals as input and extract features by path signature algorithm to capture the dynamic dependencies between different channels of EEG.Meanwhile, Bi-LSTM analyzes the inherent temporal dependencies hidden in EEG signal features [19], [20], [21].The attention mechanism adjusts the weights of different features to improve the system's performance.
In this paper, we make the following contributions: (1) We introduce the path signature algorithm to model sequential and temporal differences between EEG channels, capturing structural information and temporal correlation in sequence data.
(2) We apply the Bi-LSTM method based on the attention mechanism to classify epileptic EEG.The attention mechanism extracts important features from sequences based on weight distribution.
(3) We experiment with the proposed model on two publicly available epilepsy datasets of different sizes (CHB-MIT, TUH) and private datasets.The method in this paper obtains better classification accuracy than other prevalent epilepsy EEG signal classification methods.

II. METHOD A. Overview
The proposed epilepsy detection method's framework is shown in Fig. 2. The network consists of three parts: EEG preprocessing, path signature feature extraction, and a classification model based on a Bi-LSTM neural network with an attention mechanism.In the preprocessing stage, the EEG signal is filtered using an FIR filter with a frequency range of 0.5-50 Hz to remove noise.Next, the signal is segmented into data segments according to a fixed time window, and the path signature is utilized for feature extraction.The Bi-LSTM layer uses the temporal correlation between the two directions of brain activity past and future signals to further analyze EEG sequence data.The attention mechanism layer assigns different weights to the features from different time steps, enabling the model to focus on useful information in the data and improve the accuracy of the model.Finally, the binary classification of seizure and non-seizure classes is performed through a fully connected layer.The subsequent section provides detailed explanations of each component.

B. Feature Extraction Based on Path Signature
Path signature is a method for converting paths into feature vectors.A path can be any temporally correlated data stream, which is converted into a multidimensional path by performing an embedding algorithm on the data stream and then calculating the individual terms in the path signature.In this study, the paths are the multichannel scalp EEG signal data.
The path X is a finite-length continuous mapping from the interval [a, b] to a d-dimensional vector space, i.e., X t : , where X i t denotes the i th coordinate of X t , i∈ {1, 2, . . .,d}.For a d-channel EEG signal X with a given time interval [a, b], the l-th iterative integral S l (X ) a,b of the EEG trajectory path X is calculated as follows, with dimension d l .
When l is 0, the 0 th order iterative integral S 0 (X ) a,b of X is routinely set to 1.
When l is 1, the 1 st order iterative integral S 1 (X ) a,b of X is the set of increments of the EEG signal X along the i-th channel, i.e., it reflects the value of the voltage change of the EEG signal from the start to the endpoint, and the variable i∈ {1, . . .,d}, which is a d-dimensional vector: When l is 2, the 2 nd order iterative integral S 2 (X ) a,b of X is the set of the EEG signal X at the i th channel and the j th channel at each time point corresponding to the composition of the curve with the double integral corresponding to the i th or j th dimension, i.e., reflecting the dynamic change relationship of the trajectories between different channels of the EEG signal, the variable i, j∈ {1, 2, . . .,d}, which is a d 2 -dimensional vector: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.In general, the signature of a path is a fractional infinite series containing all l th order iteration integrals.In practice, the signature is usually truncated to a finite number of m for dimensionality reduction, denoted as Sig m (X ) a,b .However, since the l = 0 iteration integral is always 1, it carries no valuable information, so the 0 th order dimension is eliminated and the obtained EEG signal is extracted by the path signature with the following equation.
The feature dimension obtained by feature extraction is determined only by the number of channels n ch of the EEG signal and the number of truncations m.The calculation formula is as follows: where n ps represents the dimension of the extracted features.Thus, as shown in Fig. 3, the EEG slices of size (n sm , n ch , n sp ) are feature extracted by the path signature algorithm to obtain the feature vector of size (n sm , n ps ), where n sm represents the number of samples, n ch represents the number of signal channels, n sp = fre×T represents the number of samples acquired within a time window T with a sampling frequency of fre.
C. Bi-LSTM With an Attention Mechanism 1) Bi-LSTM layer: LSTM is an extension of RNN for solving the problem of gradient disappearance or explosion during back-propagation, which can effectively capture long-term dependent information from time-series data and is suitable for EEG signal classification.In the LSTM network, information propagates in temporal order along the LSTM units in the hidden layer.To exploit temporal correlation before and after seizures as well as to enhance its ability to capture long-term dependent features in EEG, we fused forward and backward propagated information to construct a Bi-LSTM-based neural network model.In the Bi-LSTM model, the parameters in both the forward and backward directions are independent of one another but share a feature vector of the EEG sequence.At each time step, the forward and backward LSTM units respectively compute their hidden vectors, denoted as f h t , and bh t , which are then concatenated to form the final hidden vector for the Bi-LSTM model.The output h t is as follows: Fig. 4 illustrates the basic structure of the Bi-LSTM model, where {x 1 , x 2 ,. . ., x n } denote the feature vectors, n is the number of time steps, {fh 1 , fh 2 ,. . ., fh n } and {bh 1 , bh 2 ,. . ., bh n } denote the forward hidden vector and backward hidden vector, respectively, and h n denotes the vector of size (n sm , n f e ) connected by fh n and bh n .
2) Attention Mechanism: When segmenting and extracting features from the input signal, some redundant information may be extracted due to the suddenness of seizure onset, inaccurate seizure labeling locations, etc.Therefore, not all feature vectors at each time step contribute equally, and standard Bi-LSTM cannot identify which part is more important for epilepsy detection.To address this issue, this paper introduces the attention mechanism into the model to adaptively enhance useful information's effect and suppress irrelevant and redundant information's effect by assigning them different weights.
The hidden vector output h t from the Bi-LSTM network is fed as input to a simple multi-layer perceptron to obtain a new hidden representation u t .Then a weight vector u w is randomly initialized and SoftMax normalized to the hidden representation u t to obtain the probability vector ∂ t .Hidden representation is learned together during the training process to characterize feature vectors' importance at different time steps.After that, the output vector s is obtained by weighting h t .The calculation formula is as follows: where W w is the attention weight matrix and b w is the bias matrix.

D. Performance Evaluation
In this study, epileptic seizure segments are in the positive class, and non-epileptic seizure segments are in the negative class.To evaluate the proposed framework's performance on three EEG datasets, we used five evaluation metrics in this research field: sensitivity, specificity, precision, accuracy, and F1 score.They are calculated as follows: Pr ecision = T P T P + F P × 100% ( 12) To mitigate prediction result bias and reduce overfitting problems, we use a five-fold cross-validation technique.In this process, the dataset is divided into five equal or approximately equal parts, four of which are used for training models, and one for testing.This process is repeated five times so that each EEG segment is tested once, and results are averaged over five times.To further evaluate model performance independent of training and validation set division, we test model generalization performance using the leave-one-out cross-validation (LOOCV) technique.

A. Datasets and Pre-Processing
One private dataset and two public EEG datasets are used to verify the proposed system.The private dataset was collected from the second Affiliated Hospital of Guangzhou Medical University.This dataset contained EDF files for monitoring patients with idiopathic generalized epilepsy for more than 2000 seconds.The dataset used the International 10-20 system of electrode placement.The first 21 channels of the dataset were related to EEG signals, and the signal acquisition frequency was 125Hz.The onset and end positions of the cases were manually marked by professional doctors.We took the length of one second as the time window to segment the continuous EEG records without overlap and divided the EEG signals in the ictal and the non-ictal as independent samples.Therefore, each sample was a matrix with 21×125 dimensions.
The first public dataset was obtained from the CHB-MIT scalp EEG database collected at Boston Children's Hospital [22].This dataset contained 24 long-term monitored EEG recordings from 22 subjects (5 males, aged 3-22 years; 17 females, aged 1.5-19 years), and it contained multiple types of clonic, absence, and tonic seizures.During EEG data acquisition, the EEG electrode positions of the International 10-20 system of electrode placement were used, and all signals were sampled at 256 HZ with the beginning and end of each seizure marked.Due to the severe imbalance between seizure and non-seizure data, we used a sliding window of 3 seconds in the training phase to generate more seizure samples.Non-seizure samples were collected by random selection of slices from all available files except those containing seizure segments.These slices included data from both pre-seizure and inter-seizure periods.The data were organized into matrices of dimensions 22 × 256.
The second public dataset used in this study was collected from the Temple University Hospital (TUH) EEG Epilepsy Corpus (TUEP) [23].It was a subset of the extensive clinical EEG signal set of 30,000 subjects.This dataset was recorded at TUH in Philadelphia, and designed to facilitate the development of automated analysis of EEG using machine learning.The TUEP dataset contained 100 subjects with epilepsy and 100 subjects without epilepsy, as determined by certified neurologists.The EEG signals were recorded using the international 10-20 standard system, with the majority of sampling frequencies of 256 HZ and 250 HZ.The raw data for this dataset was recorded using a unipolar montage, i.e., a differential voltage recording between the potential recorded on the electrodes and the reference voltage.According to the guidelines proposed by the American Clinical Neurophysiology Society and the TUH database's official website, the recorded EEG signals were converted to bipolar montage (differential voltage between two electrodes) to remove signal noise and improve the spatial information interpretation of the EEG signals [24].In this experiment, we used the bipolar Temporal Central Parasagittal (TCP) montage system, which was mostly used by neuroscientists.We segmented consecutive EEG recordings with a time window of 1 second in length without overlapping and selected 250 HZ and 256 HZ EEG signals.Therefore, each sample was a matrix with 22 × 250 or 22 × 256 dimensions.

B. Experimental Setup
The seizure detection method is implemented in Python 3.8 on Pytorch 1.9.0.The training and testing of models are carried out on an NVIDIA GeForce RTX 2080Ti with 11GB memory.Two experimental settings are used, i.e., five-fold cross-validation and leave-one-out cross-validation (LOOCV).
For the truncation number of path signatures, first-order, second-order, and third-order are taken in the private dataset for comparison experiments.TABLE I shows that the accuracy rate of the first order is the lowest among the three orders.The time cost of the feature extraction using the third-order path signature is more than 15 times that using the second-order path signature, and the accuracy rate using the third-order path signature is slightly lower than that using the second-order path signature.Therefore, the second-order path signature is optimal when taking consideration of time cost and accuracy.
In addition, as shown in TABLE II, we conducted a statistical analysis of the proposed path signature features for seizure and non-seizure classes.This analysis evaluated the mean and standard deviation values of features for different classes, as well as conducted a student's t-test for statistical significance.The test results indicate that the p-value of the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.path signature features is less than 0.001 for all three datasets, demonstrating the significance of these features in classifying seizure and non-seizure cases.Fig. 5 shows the experimental results using different sequence lengths (the number of time steps, n).When the number of sequence lengths increases, as shown in Fig. 5(a), the classification accuracy increases and reaches the maximum when it is 10 seconds.Fig. 5(b) shows the effect of the sequence length for both sampling frequencies for the TUEP dataset.It can be seen that the accuracy reaches the optimum when the sequence length is set to 60s for 250HZ data and 256HZ data.For each patient of CHB-MIT, the sequence length is set to 3 due to the limitation of the sample size.

TABLE II STATISTICAL ANALYSIS OF EXTRACTED FEATURES BASED ON PATH SIGNATURE FOR NON-SEIZURE AND SEIZURE CLASSES
Next, we choose the hidden size number of hidden layers of Bi-LSTM while keeping other parameters constant.Fig. 6 shows the accuracy curves.The accuracy reaches the maximum When the hidden size is set to 128 and the number of hidden layers is set to 2. In addition, we use 100 epochs for training and use the cross-entropy as a loss function.Adam is the optimizer, and the learning rate is set to 0.005.
Our proposed model offers a significant advantage in terms of computation efficiency.As shown in TABLE III, we compared the time and space complexity of our path signature-based feature extraction method with commonly used signal decomposition-based methods discussed in the literature.According to previous research [26], our method only requires O(N ) precomputation and storage for data sequences of length N , enabling computation to be completed within O(1) time complexity.Compared to algorithms like FFT, DWT, and EMD, which have O(NlogN) time complexity, our method exhibits higher time efficiency with comparable space complexity.Additionally, our model has a modest parameter count of 4.19M and requires only 0.063G floatingpoint operations (FLOPs) to compute using the TUEP dataset with the largest sequence length as input.

C. Patient-Specific Experiments
The performance of the proposed epilepsy detection method is evaluated on the CHB-MIT dataset through patient-specific experiments.The results of the calculation of the five evaluated parameters (accuracy, sensitivity, specificity, precision, and F1score) are summarized in TABLE IV.It can be seen that the average values of the average accuracy, sensitivity, specificity, precision, and F1-score for the 24 cases are 99.09%, 99.28 %, 98.95 %, 98.53 %, and 98.89 %, respectively.The sensitivity of 9 cases is 100 %, and the specificity and accuracy of 15 cases are 99 % or more.It can be concluded that the proposed model has high accuracy and stability for specific patients.

D. Ablation Experiments Based on Cross-Validation
The experiment performs a five-fold cross-validation of the proposed model, and the results of the five-fold crossvalidation process demonstrate the overall performance of the model on the three datasets while reducing the problem of overfitting and biased results.

TABLE IV PATIENT-SPECIFIC EXPERIMENTAL RESULTS ON CHB-MIT DATASET
To validate the contribution of the combination of path signature and Bi-LSTM with attention in our model, full ablation experiments are performed.To ensure the fairness of the experiments, the model settings are identical and the results are shown in TABLE V-TABLE VII.Compared with BiLSTM, PS+BiLSTM improves the accuracies by 2.91%, 28.85%, 28.44%, and 20.98% on the private dataset, and the two public datasets, respectively.Compared with Bi-LSTM+Attention, PS+Bi-LSTM+Attention improves the accuracies by 4.03%, 27.11%, 10.29%, and 8.61% on these three datasets, respectively.These experimental results prove that the path signature algorithm can take full advantage of the temporal correlation between EEG channels, resulting in a significant improvement in the learning ability of the model.Compared with PS+LSTM+Attention, PS+Bi-LSTM+Attention improves the accuracies by 0.61%, 1.16%, 0.61%, and 0.25% on these three datasets, respectively, which reflects that Bi-LSTM better captures the EEG sequence dependence between different time steps than LSTM.Moreover, comparing the classification results of PS+Bi-LSTM with and without attention, the method with attention can improve accuracies by 0.61%, 0.72%, 9.22%, and 0.74% on these three datasets, respectively.
In Fig. 7, we present a schematic diagram that illustrates the raw EEG signals and their corresponding attention weights for a specific clinical case.The horizontal axis represents each time step of the BILSTM model, while the heat map of attention weights demonstrates the model's level of focus on different time steps.Since epileptic seizures can have an abrupt onset and imprecise marking of seizure locations, there is a possibility of extracting redundant information.However, as depicted in the figure, the attention mechanism can assign higher weights to crucial segments of the EEG signal, enabling the recognition of specific time points that exhibit epileptic characteristic waves during seizure events.This ability aids in the identification and localization of abnormalities associated with epileptic seizures, even in cases where physicians have marked seizure occurrences ahead of time.The red dashed line in the figure indicates the actual seizure location.Thus, both Path signature and Bi-LSTM with an attention mechanism play an important role in seizure/non-seizure classification methods, and the model combining Path signature and Bi-LSTM with an attention mechanism proposed in this paper performs the best.

E. Cross-Patient Experiments
To further evaluate the performance of the proposed model, we also use leave-one-out cross-validation (LOOCV), where all segments of a given subject are not involved in the training process and are only used to test the trained model.This process is repeated for each subject in the dataset.For the private and CHB-MIT datasets, both positive and negative samples are taken from seizure segments and non-seizure segments with the same patient, so the three evaluation parameters (accuracy, sensitivity, and specificity) are calculated and Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.labeled in the figure, while for the TUEP dataset, positive and negative samples are taken from segments of epileptic patients and normal healthy subjects, so for each patient accuracy and sensitivity are synonymous in TABLE VIII and Fig. 10.
TABLE VIII and Fig. 8-10 give the results of leave-oneout method tests, respectively.It can be seen that average accuracy of the 11 subjects in the private dataset is 95.25%,  the average sensitivity is 90.87%, and the average specificity is 98.26%, with 8 subjects having an accuracy greater than 96%. the average accuracy of the 24 subjects in the CHB-MIT dataset is 94.84%, the average sensitivity is 91.05%, and the average specificity is 98.63%.Twelve of the subjects have an accuracy greater than 95% and 21 subjects have an accuracy greater than 90%.For the TUEP dataset, the average accuracy of 86 epilepsy patients is 86.90%, with 44 subjects having an accuracy of more than 90% and 8 cases reaching 100% accuracy.Only individual patients in the three datasets have    [11], [14], [25], [29], [31], RNN-based methods [14], [15], [19], [20], [21], [32], and other machine learning based methods [5], [27], [28], [30], [33].As shown in TABLE IX, for the CHB-MIT dataset, our method improves the accuracy by 1.59% more than CNN [11] and 3.80% more than CNN+LSTM [14].Compared with these deep neural network-based detection methods, our detection methods based on Path signature and Bi-LSTM networks with attention have higher detection performance.The detection methods of Zhang et al. [15], Hu et al. [19], Yao et al. [20], He et al. [21] also use Bi-LSTM recurrent networks, however, our method can capture the structural information in the sequence data by PS, so it achieves better model performance.Meanwhile, our method does not require signal rhythm decomposition and post-processing operations.We also compare our model with machine learning-based seizure detection methods [27], [28], and the accuracy and sensitivity of our method are significantly higher.In comparison to [5], our approach exhibits significantly higher detection sensitivity while maintaining comparable accuracy.Furthermore, compared to their need to select channels and decompose bands before extracting features, our method requires fewer feature extraction steps and offers a lower time complexity (see TABLE II for details).For the TUEP dataset, the methods compared use either 250 HZ or 256 HZ, so the method in this paper validates the data at both frequencies, and the results are listed in TABLE X.The detection method proposed in this paper outperforms the existing methods and obtains better results.
Most current seizure detection methods are patient-specific.Because of the physiological differences that exist between individuals with epilepsy, it is more challenging to apply models built for specific patients to data from unknown patients.TABLE XI shows a comparison of the proposed model with the latest methods for cross-patient seizure detection.Our method achieves the best performance among all methods.It further confirms that the proposed method exhibits excellent generalization performance.
In summary, our method achieves state-of-the-art performance, i.e., the highest accuracy, sensitivity, and specificity under patient-specific and cross-patient settings.

IV. CONCLUSION
This paper proposes a seizure detection method based on the Path Signature and Bi-LSTM architecture with an attention mechanism.Firstly, the Path Signature algorithm is applied to model the evolution of paths between original EEG channels at various time points, which can exploit the discriminative features.Secondly, these discriminative features are then transmitted into the Bi-LSTM model to mine temporal relationships.Thirdly, an attention mechanism is introduced Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Fig. 1 .
Fig. 1.Visualization of t-SNE of EEG signal after unprocessed and path signature feature extraction.

)F1 = 2 ×
Pr ecision × Sensitivit y Pr ecision + Sensitivit y × 100% (14) where true positive (TP) indicates the number of positive classes correctly classified; true negative (TN) indicates the number of negative classes correctly classified; false positive (FP) indicates the number of negative classes misclassified into positive classes; false negative (FN) indicates the number of positive classes misclassified into negative classes.

Fig. 5 .
Fig. 5. Validating the effect of sequence length on the private dataset and TUEP dataset.

Fig. 6 .
Fig. 6.Comparison validation accuracy of the proposed method with different hidden sizes and layers.

Fig. 7 .
Fig. 7. Visualization of raw EEG signals and attention weights for one case.

Fig. 8 .
Fig. 8. Results of the LOOCV for the private dataset.

TABLE III COMPARISON
OF DIFFERENT ALGORITHMS IN COMPUTATIONAL COMPLEXITY

TABLE VIII STATISTICAL
RESULTS OF THE LEAVE-ONE-OUT METHOD VALIDATION FOR THREE DATASETSTABLE IX COMPARISON BETWEEN THE PROPOSED METHOD AND THE EXISTING EPILEPSY DISEASE DETECTION METHODS ON THE CHB-MIT DATASET

TABLE X COMPARISON
BETWEEN THE PROPOSED METHOD AND THE EXISTING EPILEPSY DISEASE DETECTION METHODS ON THE TUEP DATASET

TABLE XI CROSS
-PATIENT PERFORMANCE COMPARISON BETWEEN THE PROPOSED METHOD AND THE LATEST EPILEPSY DISEASE DETECTION METHODSsensitivities below 80%, due to fewer seizure segments and scalp EEG recordings that are more affected by external noise.Overall, the model has satisfactory accuracy and robustness, which helps physicians to make a diagnosis.