Epileptic Seizure Forecasting With Generative Adversarial Networks

Many outstanding studies have reported promising results in seizure forecasting, one of the most challenging predictive data analysis problems. This is mainly because electroencephalogram (EEG) bio-signal intensity is very small, in $\mu \text{V}$ range, and there are significant sensing difficulties given physiological and non-physiological artifacts. Today the process of accurate epileptic seizure identification and data labeling is done by neurologists. The current unpredictability of epileptic seizure activities together with the lack of reliable treatment for patients living with drug resistant forms of epilepsy creates an urgency for research into accurate, sensitive and patient-specific seizure forecasting. Most seizure forecasting algorithms use only labeled data for training purposes. As the seizure data is labeled manually by neurologists, preparing the labeled data is expensive and time consuming, making the best use of the data critical. In this article, we propose an approach that can make use of not only labeled EEG signals but also the unlabeled ones which are more accessible. We use the short-time Fourier transform on 28-s EEG windows as a pre-processing step. A generative adversarial network (GAN) is trained in an unsupervised manner where information of seizure onset is disregarded. The trained Discriminator of the GAN is then used as a feature extractor. Features generated by the feature extractor are classified by two fully-connected layers (can be replaced by any classifier) for the labeled EEG signals. This semi-supervised patient-specific seizure forecasting method achieves an out-of-sample testing area under the operating characteristic curve (AUC) of 77.68%, 75.47% and 65.05% for the CHB-MIT scalp EEG dataset, the Freiburg Hospital intracranial EEG dataset and the EPILEPSIAE dataset, respectively. Unsupervised training without the need for labeling is important because not only it can be performed in real-time during EEG signal recording, but also it does not require feature engineering effort for each patient. To the best of our knowledge, this is the first application of GAN to seizure forecasting.


Epilepsy affects almost 1% of the global population and considerably impacts the quality of life of those patients
The associate editor coordinating the review of this manuscript and approving it for publication was Venkata Rajesh Pamula .
diagnosed with the disease [1]- [3].Over the past two decades, a tremendous number of techniques on predicting seizure has been proposed with promising performance.An early approach based on similarity, correlation, and energy of EEG signals achieved a modest sensitivity of 42% and a false prediction rate (FPR) less than 0.15/h tested with the Freiburg Hospital dataset [4].The performance improved with the use phase coherence and synchronization information in EEG signals, resulting in sensitivity 60% and FPR of 0.15/h in [5] and 95.4% and FPR of 0.36/h in [6].A similar approach with additional features by combining bi-variate empirical mode decomposition and Hilbert-based mean phase coherence improved sensitivity to over 70% and FPR to below 0.15/h [7].Different from the methods above, the authors in [8] used Bayesian inversion of power spectral density and then applied a rule-based decision.Their method achieved a sensitivity of 87.07% and FPR of 0.2/h on the Freiburg Hospital dataset.
Advances in machine learning have enabled major improvements in computer vision, language processing and medical applications [3].Support vector machine (SVM) with frequency bands of the spectral energy as inputs further boosted the performance to 98.3% and FPR of 0.29/h [9] and 98% and FPR less than 0.05/h [10] test with the Freiburg Hospital dataset.In another work, features of EEG signals were estimated on a Poincaré plane using 64 fuzzy rules [11].The features were applied principal component analysis (PCA) to reduce dimension before being classified by an SVM.This approach achieve high sensitivity of more than 91% and FPR below 0.08/h on the Freiburg Hospital dataset.In our recent work [12], we showed that convolutional neural networks (CNNs) can be used as an effective seizure prediction method.
Note that all high performance seizure forecasting algorithms were fully supervised; i.e., only labeled data were used for training.However, labeling seizure data is performed manually by neurologists and is expensive and time consuming task.There has been an increasing need to make use of unlabelled data with unsupervised feature learning such as clustering, Gaussian mixture models, Hidden Markov Models and autoencoders [13], [14].Most of these unsupervised learning techniques have been applied to seizure detection and achieved high sensitivity and specificity [13], [15], [16].However, there are few works successfully applying unsupervised learning in the seizure forecasting context.The authors in [17] trained unsupervised stacked autoencoders (SAE) then optimized the SAE's features with principal component analysis, independent component analysis, and differential search algorithm.These features were combined with engineered features from a priori knowledge before being classified by an SVM.This approach achieved a sensitivity of 95% and FPR of 0.06/h tested with a dataset of two epilepsy patients developed and released by the University of Pennsylvania and the Mayo Clinic.In another work, a deep convolutional autoencoder was used as unsupervised feature extractor [18].The extracted features were fed to a bidirectional long-short term memory (Bi-LSTM) to perform the seizure prediction task.This method was tested with the CHB-MIT dataset with a sensitivity of 94.6% and a FPR of 0.04/h.
In this work, we exploit a deep convolutional generative adversarial network (GAN) [19] as an unsupervised technique to extract features from unlabeled EEG signals that can be used for seizure forecasting task.The extracted features can be classified by any classifier (a neural network with two fully-connected layers in this work).Structure of this article is as follows.We first introduce the datasets being used in this work.Next, we describe how EEG signals are pre-processed.Then we provide details on GAN and how it can be used as a feature extractor for seizure forecasting.Lastly, we evaluate our approach and discuss the results on three datasets.A preliminary version of this work has been reported in [20].The contribution of this paper includes: • Confirming unsupervised feature learning using GAN for seizure forecasting is generalizable across multiple epilepsy EEG datasets, • Bridging the gap between supervised and semi-supervised approaches, • Linking patient-specific characteristics to seizure forecasting performance.

II. PROPOSED METHOD A. DATASET
Table 1 summarizes the three datasets being used in this work: the CHB-MIT dataset [21], the Freiburg Hospital dataset [22], and the EPILEPSIAE dataset [23].The CHB-MIT dataset contains scalp EEG (sEEG) data of 23 pediatric patients with 844 hours of continuous sEEG recording and 163 seizures.Scalp EEG signals were captured using 22 electrodes at a sampling rate of 256 Hz [21].We define interictal periods that are at least 4 h away before seizure onset and after the seizure ends.In this dataset, there are cases that multiple seizures occur close to each other.For the seizure forecasting task, we are interested in predicting the leading seizures.Therefore, for seizures that are less than 30 min away from the previous one, we consider them as only one seizure and use the onset of leading seizure as the onset of the combined seizure.Besides, we only consider patients with less than 10 seizures a day for the prediction task because it is not very critical to perform the task for patients having a seizure every 2 hours on average.With the above definition and consideration, there are 13 patients with sufficient data (at least 3 leading seizures and 3 interictal hours).The Freiburg Hospital dataset consists of intracranial EEG (iEEG) recordings of 21 patients with intractable epilepsy.Due to the lack of availability of the dataset, we are only able to use data from 13 patients.A sampling rate of 256 Hz was used to record iEEG signals.In this dataset, there are 6 recording channels from 6 selected contacts where three of them are from epileptogenic regions, and the other three are from the remote regions.For each patient, there are at least 50 min preictal data and 24 h of interictal.More details about Freiburg dataset can be found in [4].
EPILEPSIAE is the largest epilepsy database that contains EEG data from 275 patients [23].In this paper, we analyze scalp EEG of 30 patients with 261 leading seizures and 2881.4 interictal hours in total.The time-series EEG signals were recorded at a sampling rate of 256 Hz and from 19 electrodes.Seizure onset information obtained by two methods, namely EEG based and video analysis, is provided.In our study, we use seizure onset information using EEG based technique where the onsets were determined by visual inspection of EEG signals performed by an experienced clinician [23].

B. PRE-PROCESSING
Since we will use a Generative Neural Network (GAN) architecture with three de-convolution layers, dimensions of GAN's input must be divisible by 2 3 , except the number of channels.Specific to CHB-MIT dataset, some patients have less than 22 channels of recording EEG due to changes in electrodes.Particularly, Pat13 and Pat17 have only 17 available channels; Pat4, Pat9 have 20, 21 channels, respectively.Since we are interested in whether GAN can be effectively trained with non-patient specific data, all patients must have the same number of channels so that data from all patients can be combined.We follow the approach in [24] to select 16 channels for each patient in CHB-MIT dataset.With regards to CHB-MIT and Freiburg datasets, we use Short-Time Fourier Transform (STFT) to translate 28 seconds of time-series EEG signal into a two-dimensional matrix comprised of frequency and time axes.For the STFT, we use cosine window of 1-second length and 50% overlap.Most of EEG recordings were contaminated by power line noise at 60 Hz (see Fig. 1a) for CHB-MIT dataset and 50 Hz for Freiburg dataset.The power line noise can be removed by excluding components at the frequency range of 47-53 Hz and 97-103 Hz if the power frequency is 50 Hz and components at the frequency range of 57-63 Hz and 117-123 Hz for the power line frequency of 60 Hz.The DC component (at 0 Hz) was also removed.Fig. 1b shows the STFT of a 28-s window after removing power line noise.We also trim components at the last two frequencies 127-128 Hz to have the final dimension of each pre-processed 28 s be (number-of-channels × X × Y ) = (n × 56 × 112), where X and Y are time and frequency dimensions, respectively.n is 16, 6, 19 for the CHB-MIT dataset, the Freiburg Hosiptal dataset and the EPIELEPSIA dataset, respectively.

C. ADVERSARIAL NEURAL NETWORK
In this paper, we use a Deep Convolutional Generative Adversarial Network (DCGAN) [19] as depicted in Fig. 2 as an unsupervised feature extraction technique.Note that here The idea of training a generative adversarial network is that the Discriminator (D) and Generator (G) compete with each other until they finally reach an equilibrium [25].However, when we first started training the DCGAN, we observed that the Discriminator converged too fast.This prevents the Generator from learning how to generate high quality STFT samples that are not distinguishable from real STFT samples.As a result, the classification between generated STFT samples and original ones becomes a trivial task.To overcome this, we update the Generator twice instead of once every mini-batch as suggested in [26] and configure an early-stopping monitor to keep tracks of loss values of the Discriminator and Generator (defined in Eqs. 1 and 2 [25]).The input is fully-connected with a hidden layer with the output size of 6272 which is then reshaped to 64 × 7 × 14.The hidden layer is followed by three de-convolution layers with filter size 5 × 5, stride 2 × 2. Numbers of filters of the three de-convolution layers are 32, 16 and n, respectively.The Discriminator consists of three convolution layers with filter size 5 × 5, stride 2 × 2. Numbers of filters of the three convolution layers are 16, 32 and 64, respectively.Patient 1 from the CHB-MIT dataset.One can observe that the Generator's loss (G loss ) is lower and has less variation in scenario (2) which means the generated STFT samples better resemble the original ones.A better Generator in turn helps to improve the discriminant performance of the Discriminator.The Generator and the Discriminator reach their equilibrium after around 2000 steps where the early-stopping monitor stops the training.Note that the early-stopping was turned off when collecting loss values to produce Fig. 4.
The Discriminator's loss, D loss , and the Generator's loss, G loss , are defined as [25]: where m is the batch size (64), x is the original STFT of EEG signals, z is sampled from the distribution U(−1, 1).We investigate the system performance in three scenarios: (1) GAN is trained with data of all patients combined (from the same dataset), (2) GAN is trained in a patient-specific FIGURE 5. Definition of seizure occurrence period (SOP) and seizure prediction horizon (SPH).For a correct prediction, a seizure onset must be after the SPH and within the SOP.fashion, and (3) GAN is trained in a patient-specific fashion with improvement.In scenario (3), similar to the dataset balancing technique proposed in [12], we generate extra samples from existing ones.As a result, the training set in scenario (3) is ten times larger compared to the one in scenario (2).Our model training is performed on an NVIDIA P100 graphic card using Tensorflow 1.4.0 framework.

D. SEIZURE FORECASTING WITH FEATURES EXTRACTED BY DCGAN
We use the trained convolution layers in the DCGAN's Discriminator as a feature extractor.Specifically, we feed STFT of 28-second EEG signals into the Discriminator and collect the flatten features at its last convolution layer's output (64 × 7 × 14).Those features can now be used with any classifier to perform the seizure forecasting task.In this paper, we use a simple neural network consisting of two fully-connected layers with sigmoid activation and output sizes of 256 and 2, respectively.The former layer uses sigmoid activation function while the latter uses soft-max activation function.Both of the two layers have a drop-out rate of 0.5.The training of this two-layer neural network is patient-specific.We also apply a practice proposed in [12] to prevent over-fitting during the training of the neural network.In particular, we perform dataset balancing and then choose 25% later preictal and interictal samples from the training set to monitor if over-fitting occurs and use the rest to train the network.

E. SYSTEM EVALUATION
Seizure prediction horizon (SPH) and seizure occurrence period (SOP) need to be defined before estimating the system's performance.In this paper, we follow the definition of SOP and SPH that was proposed in [4] (see Fig. 5).SOP is the interval where the seizure is expected to occur.The period between the alarm and the beginning of SOP is called SPH.For a correct prediction, a seizure onset must be after the SPH and within the SOP.Likewise, a false alarm rises when the prediction system returns a positive but there is no seizure occurring during SOP.When an alarm rises, it will last until the end of the SOP.Regarding clinical use, SPH must be long enough to allow sufficient intervention or precautions.In contrast, SOP should be not too long to reduce the patient's anxiety.
We use area under the receiver operating characteristics curve (AUC) with SPH of 5 min and SOP of 30 min.To have a robust evaluation, we follow a leave-one-out cross-validation approach for each subject.If a subject has N seizures, (N −1) seizures will be used for the supervised training and the withheld seizure for validation.This round is repeated N times so all seizures will be used for validation exactly one time.Interictal segments are randomly split into N parts.(N − 1) parts are used for training and the rest for validation.The (N −1) parts are further split into monitoring and training sets to prevent over-fitting [12].
We compare our semi-supervised learning models with a fully-supervised approach using CNN reported in our previous work [12].We also compare the forecasting performance with a random predictor.Specifically, we use the single-tailed Hanley-McNeil AUC test [27] to compare our AUC scores with the chance level 0.5).The AUC values for the Hanley-McNeil AUC test are calculated from all seizure forecasting scores during the leave-one-out cross-validation for each patient.

III. RESULTS
In this section, we test our approach with three datasets: the CHB-MIT sEEG dataset, the Freiburg Hospital iEEG dataset, and the EPILEPSIAE sEEG dataset.SOP = 30 min and SPH = 5 min were used in calculating all metrics in this paper.Each fold of leave-one-out cross-validation was executed twice, and average results with standard deviations were reported.Fig. 6 summarizes seizure forecasting results with SOP of 30 min and SPH of 5 min.Results in details are provided in Tables 2-4.
Compared to the fully supervised CNN, GAN-NN introduces ∼ 6%, ∼ 12% and ∼ 6.6% loss in AUC for the CHB-MIT sEEG dataset, the Freiburg Hospital iEEG dataset, and the EPILEPSIAE sEEG dataset, respectively.When GAN is trained per patient (GAN-PS-NN), the average AUC drops further to 72.63%, 60.91% and 63.6% for the three datasets.This can be explained by the limited amount of data from each patient.By applying 10× up-sampling (GAN-PS-USPL-NN), the average AUC is boosted to 75.66% and 74.33% for the CHB-MIT dataset and the Freiburg Hospital dataset, respectively, which are 1-2% lower than those of GAN-NN.Regarding the EPILEP-SIAE dataset, up-sampling technique improves overall AUC by 2% higher compared to patient-specific GAN without up-sampling (GAN-PS-NN) and 0.7% higher compared to non-patient-specific GAN (GAN-CNN).Fig. 7 demonstrates the overall seizure performance across different models and datasets.Tables 2-4 show that our seizure forecasting method is significantly better than the chance level for most of the patients at a significance level of 0.05.The supervised and semi-supervised learning methods (namely CNN, and GAN-PS-USPL-NN) outperform the random predictor for most of the patients.The percentages of patients with forecasting performance above the chance level for the two methods are (92.30%,84.61%), (100%, 84.61%), and (86.67%, 86.67%) for the CHB-MIT dataset, the Freiburg Hospital dataset, and the EPILEPSIAE dataset, respectively.FIGURE 6. Seizure forecasting performance for the CHB-MIT dataset (upper) and the Freiburg Hospital dataset (lower).Four methods are evaluated: (1) CNN: convolutional neural network [12], (2) GAN-NN: unsupervised feature extraction using generative adversarial network (GAN) and classification performed by a two-layer neural network, (3) GAN-PS-NN similar to (2) but GAN is done patient-specific, (4) GAN-PS-USPL-NN: similar to (3) but 10× over-sampling of samples is performed when training GAN.

IV. DISCUSSION
We have shown that feature extraction for seizure forecasting can be performed in an unsupervised way.Though the overall AUC degraded by ∼ 6-12% across the three datasets, our unsupervised feature extraction can help to minimize the EEG labeling task that is costly and time-consuming.Specifically, unlabeled EEG signals are used to train the GAN.The trained GAN plays like a feature extractor.Extracted features from labeled EEG data (that can be much smaller than unlabeled one) can be fed to any classifier (two fully-connected layers in our work) for the seizure forecasting task.
There is still a gap in seizure forecasting performance between fully-supervised (CNN) and semi-supervised approaches.We argue that this is because the size of training data for GAN is not big enough.This argument is supported by the results of over-sampling data for training GAN.We have shown that over-sampling the inputs during training GAN helps to fill the gap for some patients and boost the seizure forecasting performance in overall.It is reasonable to argue that with more EEG data, the prediction accuracy can be improved.The advantage of using unsupervised feature extraction is that we can train the feature extractor (GAN) while recording EEG data, i.e., online training, without inducing extra efforts from clinicians.
Previous works using autoencoder-based unsupervised feature extraction [17], [18] achieved sensitivity higher than 94% and FPR lower than 0.06/h, which, however, cannot be directly compared with the performance of our method.The work in [17] not only utilized the unsupervised feature extraction by stacked autoencoders but also engineered features from a priori knowledge.Therefore, it is not clear how much the extracted features from the stacked autoencoders contribute to the final performance.Also, the method was tested with only two patients with intracranial EEG signals.The other work in [18] defined preictal period right next to ictal period which means seizure prediction period (SPH) is zero.However, from a clinical perspective, SPH needs to be long enough to allow sufficient [12].
In the field of computer vision, GAN can help to reduce the amount of labeled without compromising the classificaperformance [28].Unfortunately, with the current of the datasets available to us, we could not replicate similar  claim for seizure forecasting using GAN as an unsupervised feature extractor.
Another aspect that we believe it is important is that how patient-specific characteristics, such as seizure types [2], [29], age, and gender, affects seizure forecasting performance testing with the EPILEPSIAE dataset.In this dataset, seizures are categorized into focal aware (simple partial), focal impaired awareness (complex partial), focal to bilateral tonic-clonic (secondarily generalized tonic-clonic) and unclassified.Age of the patients is ranging from 13 to 67.In terms of seizure type, focal aware seizures have the least variation in seizure forecasting.This observation could be helpful for clinical trial consideration; e.g., focus on patients with focal aware seizures first.Regarding the gender, seizure forecasting is better for female patients overall, with an exception that there is one female who has a very low AUC score (below 35%).It is most interesting to observe that patients with age in the range of 10 to 30 have considerably higher AUC score and less variation compared to other groups.In fact, if we exclude the patient with very low AUC score which is an outlier from group 10 to 30, it can be seen that seizures of young patients (30 and below) can be predicted with the highest accuracy.The reason behind this observation is not clear and is not in the scope of this article.

V. CONCLUSION
Seizure forecasting capability has been studied and improved over the last four decades.A perfect prediction is yet available but with current prediction performance, it is useful to provide the patients with warning message so they can take some precautions for their safety.We have shown that feature extraction for seizure forecasting can be done using unsupervised deep learning or GAN particularly.Using semi-supervised seizure forecasting approach, 61.53% of the patients in the CHB-MIT dataset, 53.84% in the Freiburg Hospital dataset and 13.33% in the EPILEPSIAE dataset have very good seizure forecasting performance (with AUC above 80%).Through our observations regarding patient-specific characteristics, it is suggested that female patients under thirty years old with focal aware seizure type would benefit the most from such a seizure forecasting system.

FIGURE 1 .
FIGURE 1.(a) Example short-time Fourier transform (STFT) of a 28-second window.(b) STFT of the same window after removing power line noise.
The monitor stops the DCGAN training if D loss keeps being larger than G loss over k consecutive training batches.In this work, we used k = 20, batch size of 64, and Adam optimizer for gradient-based learning with a learning rate of 1e −4 , β1 = 0.5, β2 = 0.999, and = 1e −8 .The effect of updating the Generator twice can be verified by visualizing the loss values.In Fig. 4, we plot the Discriminator and the Generator's loss values in two scenarios: update the Generator (1) once, and (2) twice every mini-batch using data of

FIGURE 2 .
FIGURE 2. The Generator takes a random sample of 100 data points from a uniform distribution U (−1, 1) as input.The input is fully-connected with a hidden layer with the output size of 6272 which is then reshaped to 64 × 7 × 14.The hidden layer is followed by three de-convolution layers with filter size 5 × 5, stride 2 × 2. Numbers of filters of the three de-convolution layers are 32, 16 and n, respectively.The Discriminator consists of three convolution layers with filter size 5 × 5, stride 2 × 2. Numbers of filters of the three convolution layers are 16, 32 and 64, respectively.

FIGURE 3 .
FIGURE 3. Seizure forecasting with features extracted by DCGAN's Discriminator.Inputs are short-time Fourier transform (STFT) of 28-s windows of raw electroencephalogram (EEG) signals.Features extracted by the three convolution blocks of the Discriminator are flattened and connected to a neural network consisting of 2 fully-connected layers with the output sizes 256 and 2, respectively.The former fully-connected layer uses sigmoid activation function while the latter uses soft-max activation function.Both of the two layers have drop-out rate of 0.5.Note that the two-layer neural network can be replaced with any other binary classifier.

FIGURE 4 .
FIGURE 4. The Discriminator's and the Generator's loss values in two scenarios: update the Generator (1) once (a-b) and (2) twice (c-d) every mini-batch using data of Patient 1 from the CHB-MIT dataset.

FIGURE 7 .
FIGURE 7. Receiver operating characteristics (ROC) curves of seizure forecasting performance testing for different patients of the three datasets: (a) -the CHB-MIT sEEG dataset, (b) -the Freiburg Hospital iEEG dataset, and (c) -the EPILEPSIAE sEEG dataset.Each line corresponds to one patient.Above the green dash line: good performance; above the blue dash line: very good performance (adapted from[1]).

TABLE 2 .
Seizure forecasting performance for the CHB-MIT dataset.p-values are from the single-tailed Hanley-McNeil AUC test to compare our seizure forecasting performance with the chance level (AUC = 0.5).Patients with p-values not being highlighted in gray color have seizure forecasting performance significantly better than the chance level with the significance level of 0.05.

TABLE 3 .
Seizure forecasting performance for the Freiburg Hospital dataset.p-values are from the single-tailed Hanley-McNeil AUC test to compare our seizure forecasting performance with the chance level (AUC = 0.5).Patients with p-values not being highlighted in gray color have seizure forecasting performance significantly better than the chance level with the significance level of 0.05.

TABLE 1 .
Summary of the three datasets used in this paper.

TABLE 4 .
Seizure forecasting performance for the EPILEPSIAE dataset.p-values are from the single-tailed Hanley-McNeil AUC test to compare our seizure forecasting performance with the chance level (AUC = 0.5).Patients with p-values not being highlighted in gray color have seizure forecasting performance significantly better than the chance level with the significance level of 0.05.