Automatic Epileptic Seizure Detection Using Graph-Regularized Non-Negative Matrix Factorization and Kernel-Based Robust Probabilistic Collaborative Representation

Automatic seizure detection system can serve as a meaningful clinical tool for the treatment and analysis of epilepsy using electroencephalogram (EEG) and has obtained rapid development. An automatic detection of epileptic seizure method based on kernel-based robust probabilistic collaborative representation (ProCRC) combined with graph-regularized non-negative matrix factorization (GNMF) is proposed in this work. The raw EEG signals are pre-processed through the wavelet transform to obtain time-frequency distribution of EEG signals as preliminary feature information and GNMF is further employed for dimension reduction, retaining and enhancing the productive feature information of EEG signals. Then, the test sample is represented using robust ProCRC that can decide whether the testing sample belongs to each class (seizure or non-seizure) by jointly maximizing the likelihood. In addition, the kernel trick is applied to improve the separability of non-linear high dimensional EEG signals in robust ProCRC. Finally, post-processing techniques are introduced to generate more accurate and reliable results. The average epoch-based sensitivity of 96.48%, event-based sensitivity of 93.65% and specificity of 98.55% are acquired in this method, which is evaluated on the public Freiburg EEG database.

worldwide live with epilepsy [1], [2]. Electroencephalogram 32 plays an important role in seizure detection, which reflects the 33 brain electrical activities [3]. However, the technique could 34 be time consuming for trained electroencephalographers to 35 perform visual inspection on the long term EEG recordings 36 over an extended period of time. Sometimes this process may 37 lead to inaccuracies [4]. Besides, inconsistent identification 38 results are possible for the same EEG recording because of 39 the subjective nature of the analysis. Therefore, automatic 40 seizure detection systems are needed for epilepsy monitoring 41 and treatment as an effective and reliable tool to aide medical 42 staff [5]. 43 In the early 1980s, the method that decomposed EEG 44 signals into half waves and extracted sharpness, slope and 45 rhythmicity for classification was one of the earliest seizure 46 detection algorithms to be introduced in [6]. Subsequently, 47 automatic seizure detection systems have generated consid-48 erable interest and various methods have been introduced 49 for epileptic detection, with varying degrees of success. 50 Srinivasan et al. [7] introduced an epileptic detection method 51 that applied the neural network and combined it with approx-52 imate entropy. Wu et al. proposed an automatic seizure 53 detection method based on complementary ensemble empirical 54 mode decomposition (CEEMD) and extreme gradient boosting 55 (XGBoost) [8]. Khan et al. combined local binary patterns 56 (LBP) and discrete wavelet transform (DWT) to perform 57 time-frequency decomposition of signals, using univariate and 58 bivariate as features for epilepsy detection [9]. 59 Recently, deep learning has also been widely applied to 60 seizure detection or prediction, which achieved good perfor-61 mance. Karácsony et al. proposed an end-to-end deep learning 62 approach based on a combination of Mask R-CNN, Inflated 3D 63 Convnet feature extraction and LSTM-FC for the classification 64 of Frontal and Temporal Lobe epileptic seizures [10]

139
• The pre-processing with the normalization based on dif-140 ferential operator, as well as the post-processing including 141 multi-label fusion rule, optimize the performance of the 142 algorithm.

143
• The competitive experiment results are achieved on the 144 long-term EEG database with low computational cost.

145
The structure of this paper is as follows: section 2 briefly 146 introduces the database background and the details of the 147 datasets. The detailed algorithm for seizure detection is then 148 exhibited in section 3. Next, the results are showed in section 149 4 and section 5 is devoted to a detailed discussion of the 150 algorithm's performance. Finally, section 6 presents the con-151 clusions for this work. The database from the Epilepsy Center of the University 155 Hospital of Freiburg was used to evaluate the automatic seizure 156 detection system proposed in this paper and this database 157 provided the EEG signals for 21 patients. A Neurofile NT 158 digital video EEG system was used to sample the raw EEG 159 signals with a 256 Hz sampling rate. A 16-bit A/D converter 160 was used to record the epilepsy activities. Besides, six chan-161 nels were selected previously by well-trained epileptologists, 162 including three extra-focal and three focal channels. The EEG 163 data recorded in the three focal channels were used in this 164 paper.

165
Based on the clinical manifestation, the onset and offset of 166 epilepsy activities were determined by well-trained epileptolo-167 gists. For each patient, the recording time containing seizures 168 ranged from 2 to 5 h because each patient had different types 169 of epilepsy. For each recorded seizure activity, the durations 170 were also different; for example, some were less than 12 s 171 while others were more than 15 min. A summary description 172 is shown in Table I. In this automatic seizure detection system, the database 175 was divided into two parts: the training set and testing set. 176 To address the imbalance quantity in the seizure and non-177 seizure signals, the number of two types of signals in training 178 set was selected according to the following rule: for most 179 patients, one or two seizure recordings were chosen and double 180 non-seizure recordings were selected to form the training set. 181 It should be noted that for patient 18, three seizures were 182 used because the durations of seizure activities were too short. 183 Then data augmentation is performed on the seizure data in the 184 way that these data are resampled twice applied the synthetic 185 minority oversampling technique (SMOTE), which makes the 186 amounts of seizure and non-seizure data equal.

187
In total, for the training set, there are 0.33 h of seizure 188 EEG recordings and 0.66 h of non-seizure recordings, which 189    Here, we used the Daubechies-4 wavelet as 211 the wavelet function, because it has shown outstanding per-212 formance in capturing EEG signals characteristics [30], [31]. 213 Then, EEG sub-signals were reconstructed using d3, d4 and 214 d5 for the detection system because the frequency for most 215 seizure activities was often below 30 Hz [32] and these 216 sub-signals produce less artifacts and background noise than 217 the other sub-signals. At the same time, if other detailed 218 coefficients were used, more data redundancy would occur and 219 more significant effects would still not be guaranteed.

220
Considering that the changes are not significant for some 221 seizure events comparing to non-seizure recordings, the nor-222 malization based on differential operator is applied to enhance 223 the contrast between the background (non-seizure data) and 224 seizure data. After normalization, EEG signals can also satisfy 225 the non-negative constraint for GNMF. Here, the operator is 226 defined as: Here, di f f refers to the first-order derivative, di f f (  Hence, GNMF was proposed to overcome this limitation.

241
For the data matrix X = [x 1 , · · ·, x N ] ∈ R M×N , NMF aims 242 at decomposing it into the product of two non-negative factors 243 as: coefficient matrix and the base matrix, respectively.

247
The Euclidean distance between V and U is used as the 248 objective function: On the basis of the manifold learning theory, the GNMF 251 algorithm was proposed in [21] in which the local geometric 252 structure information was retained after matrix factorization. 253 The objective function is introduced for GNMF as follows: The local optimal updating rules used to obtain the two 256 matrices are as follows: After GNMF, the base matrix U is obtained and we can convert 260 the raw EEG data into low-dimensional space from the original 261 space. The representation is defined as follows: In this paper, GNMF retains the main feature information 266 after dimensionality reduction by a new representation based 267 on U † . Considering EEG data will have non-negative values 268 after DWT decomposition, so the differential operator-based 269 normalization provides non-negative values that satisfy the 270 non-negative constraint for GNMF. We used the training 271 data after pre-processing to perform GNMF and obtain the 272 base matrix U in this paper. Furthermore, the pseudo-inverse 273 matrix U † was computed and performed to transform the refers to the data of k th class. According to [29], for a data 285 points y inside S, a probability is defined through a Gaussian 286 function: And y has the higher probability approaching to the S center Practically, the testing sample z may be outside the collab-293 orative subspace and, if this is the case, the probability that 294 the measure z belongs to S is defined as: Here, λ = c/k and y is in the subspace S. 298 For seizure detection, the EEG signals have two classes 299 (seizure and non-seizure), so the algorithm should decide that 300 the testing sample belongs to each class signal. For a sample 301 y inside S, the collaborative representation can be defined as To represent a seizure data 304 y having the same class label as y k , the probability can be 305 calculated as: ς is a constant.

308
In addition, for a testing EEG sample z outside S, the 309 probability can be computed as: Suppose that a common data point y can be determined 314 to maximize the joint probability and l(z) = k is also 315 independent, we can then obtain the class label of z as: By applying the logarithm operator in the previous formula, 320 the result is:

323
where γ and λ are parameters.

324
So, we obtain the probability that In many classification problems, l 1 norm has a better perfor-328 mance in enhancing the robustness, so the Laplacian kernel is 329 used and the probability is re-defined as: The iterative re-weighted least square algorithm is applied to 332 solve the sparse coefficient vector (α) for robust ProCRC and 333 so we have (16), as shown at the bottom of the next page.

334
Then, the sparse coefficient is applied to represent testing 335 EEG signals and the residual related to seizure and non-seizure 336 samples can be further obtained for classification. As an effective data processing method, the kernel method 339 is applied to improve the linear separability for EEG signals 340 and the new linear representation of the testing samples are 341 obtained after mapping. The non-linear mapping is defined as: 342 R m → R F . R m refers to the original data space and R F refers 343 to the high-dimensional space. (y i ), (y j ) R F refers to the 344 inner products for the transformed samples and is defined by 345 the kernel function as: Here, K (, ) refers to the kernel function in the original data 348 space. After mapping, the training samples y and testing 349 In addition, the representation coefficient β is re-defined for 361 the high space R F after implementation of the kernel method 362 as: After calculating the sparse representation coefficient, we can 367 compute the residual of seizure class and non-seizure class 368 data through the following definition:  For this system, we define the label "0" represents a 387 non-seizure segment and "1" represents a seizure segment.

388
To obtain the binary labels, the threshold judgement is applied 389 to the smoothing result g by compared to a fixed threshold r .

390
If g > r , the label is set to "1" (seizure segment), otherwise,  The Freiburg EEG database was adopted to evaluate the 422 proposed detection algorithm and the experiments were exe-423 cuted in the MATLAB2016 with an Inter Core processor with 424 a 2.4Hz environment on a personal computer. To evaluate the 425 proposed seizure detection algorithm, two types of evaluation 426 criteria were used in this paper. The raw EEG signals were 427 divided into 4 s segments in the pre-processing stage, so the 428 first approach was segment-based evaluation, which contained 429 three contents: sensitivity, recognition accuracy and specificity. 430 The segments detected by this method were compared with the 431 segments labeled by experts, and three measures were defined 432 as follows:

433
Sensitivity: the number of seizure segments that were 434 detected correctly by our system divided by all seizure 435 segments labeled by experts. This measure represents the 436 accurate ability to detect epilepsy seizure signals. Specificity: 437 the non-seizure segments that were labeled by our system 438 divided by all non-seizure segments labeled by experts. The 439 specificity represents the accurate ability to detect non-seizure 440 data. Recognition accuracy: EEG segment numbers that were 441 distinguished correctly divided by the total EEG segments.

442
Although high sensitivity and specificity are the goals of 443 most epilepsy detection systems, these two goals are contra-444 dictory. Specifically, improving sensitivity reduces specificity 445 as non-seizure signals may be incorrectly marked as seizure 446 signals. Meanwhile, if higher specificity is obtained, the sen-447 sitivity is reduced and some seizure activities may be missed. 448 In addition, the quantity of non-seizure data is much longer 449 than seizure data leading to a particular imbalance between 450 two types of data. Hence, the appropriate methods for each 451 stage of this algorithm are needed to solve the data imbalance 452 problem and obtain both better sensitivity and specificity in 453 this work.

454
The results evaluated by the first approach are depicted in 455  Table II. The average results that were achieved included a 456 sensitivity of 96.48%, recognition accuracy of 98.56% and 457 specificity of 98.55%. For most patients (18 out of 21), the 458 sensitivity was 100%, which means that our method could 459 detect all seizure segments for these seizure activities. The 460 lowest sensitivity of 66.67% was obtained in patient 1 and 461 19 because the short duration of seizure events resulted in 462 unclear epilepsy activities. Besides patient 10, whose results 463 had the lowest specificity of 91.31%, all other patients result 464 achieved a specificity higher than 92%, with 15 patient results 465 achieving a specificity greater than 99%.

466
The second approach was an event-based evaluation that 467 focused on the number of seizure events, not the number of 468 EEG segments. Therefore, the number of seizure activities 469 detected by this method was compared with the number of 470 activities labeled by experts. Here, the sensitivity was defined 471 as the number of seizure events correctly detected by our 472 method divided by all seizure events labeled by experts. 473 Another criterion was the false detection rate, which was used 474 to focus on the non-seizure data that were marked as seizure 475 data by our system and calculated as the average number of 476 false seizures detected per hour. The results are shown as 477 Table III. 478 In total, 56 seizure events from testing set were adopted 479 to evaluate this method and 53 events were detected correctly, 480     Fig. 3. The residual difference of one-minute recordings from patient 2; "1" refers to the residual difference of seizure data and "2" refers to the residual difference of non-seizure data.
classification results. However, selecting valid features is time-494 consuming, and the choice of classifiers is not always appro-495 priate to best distinguish seizure from non-seizure data. In our 496   ProCRC without kernel method, and 95.24% average sen-533 sitivity, 90.83% recognition accuracy and 90.82% average 534 specificity were obtained through the robust ProCRC with 535 kernel method which was shown in Fig.6. The experiment 536 using the kernel method obtained better classification results, 537 which suggests that the linear representation of the testing 538 sample becomes more accurate for classification after kernel 539 mapping.

540
To further quantify the performance of our method, we cal-541 culated the running time for each step: The training step had 542 a duration of 4.902 s, while for 1 h of EEG data, it took 543 41.54 s to obtain classification representation in the testing 544 stage. In the study of Yuan et al., the training step had a 545 duration of 15 s, while it took 1 min to obtain classification 546 representation in the testing stage for 1 h of data using kernel 547 collaborative representation [33]. Overall, the efficiency in our 548 proposed algorithm indicates that it is feasible as a real-time 549 seizure detection system.

550
Table IV provides a comparison between our method and 551 other methods that used the Freiburg database as well. 552 The seizure detection method introduced by [  and stockwell transform to perform seizure detection [36].

568
Even though 680 h of EEG data were used to evaluate the 569 performance, a lower false detection rate of 0.24/h and a 570 higher sensitivity of 98.09% were obtained, the patient 10 in 571 this database is not considered in their method, whereas we 572 analyzed data from all 21 patients. Furthermore, the time 573 complexity of the training stage is about 48 s, which is 574 relatively longer than our method with 4.902s, and their 575 method uses far more hyperparameters than our method.

576
As an efficient classification method, the CRC was intro-577 duced in [33] for epileptic detection. This method was used achieved. It shows that the recognition performance of our 590 algorithm for seizure EEG data is not bad, and has a better 591 recognition effect on non-seizure data. Tensor, as an efficient 592 method, has been used for seizure classification. Ma et al. [38] 593 calculated tensor distance as EEG feature and applied the 594 BLDA classifier for the detection algorithm. This method 595 obtained a sensitivity of 95.12% and specificity of 97.60% 596 for the EEG data from 21 patients, which were lower than 597 those obtained with our method.

598
Moreover, many deep learning algorithms have also been 599 validated and evaluated on this database. Zhou et al. proposed 600 a convolutional neural network (CNN) to distinguish ictal, 601 preictal, and interictal stages for seizure detection and achieved 602 the average accuracies of 95.4% for interictal and ictal classi-603 fication using frequency domain signals, which is lower than 604 our methods [39]. Hussain et al. used 1 D-convolutional long 605 short-term memory neural networks to automatically generate 606 customized features for better classification of ictal, interictal, 607 and preictal segments, which achieved a highest classification 608 accuracy of 99.27%. Even though the results were better than 609 our method, they used 80% of the EEG segments for training 610 the model and the remaining 20% of signals were used to test, 611 while Our algorithm requires very little training data [40]. 612 Li et al. combined the fully convolutional network and the 613 Nested Long Short-Term Memory (NLSTM) model for end-614 to-end automatic seizure detection and the average sensitivity 615 of 97.47%, specificity of 96.17%, and false detection rate 616 of 0.487/h are yielded [41]. Compared with their method, 617 a higher accuracy and a lower false detection rate are observed 618 in our proposed method. Therefore, our algorithm has compa-619 rable performance with lower algorithm complexity and does 620 not require too many labeled samples for supervised learning. 621 VI. CONCLUSION 622 An efficient system using GNMF and kernel-based robust 623 ProCRC is proposed for epileptic seizure detection. For 624 long-term raw EEG signals, GNMF is mainly applied to 625 dimensionality reduction and preserve the useful EEG feature 626 information, after which the kernel method is used to enhance 627 the linear separable representation. After performing kernel-628 based robust ProCRC, the training set is applied to represent 629 the testing samples sparsely and the residuals are compared to 630 distinguish the two classes of EEG signals and obtain the class 631 label. The classification results and low time complexity also 632 indicate the potential application value for real-time clinical 633 application of the proposed algorithm.