Classification of Focal and Non-Focal Epileptic Patients Using Single Channel EEG and Long Short-Term Memory Learning System

The process of inspecting electroencephalography (EEG) signals of patients with epilepsy to distinguish between focal and non-focal seizure source is a crucial step prior to surgical interference. In this paper, a deep learning approach using a long short-term memory (LSTM) algorithm is investigated for the purpose of automatic discrimination between focal and non-focal epileptic EEG signals. The study is carried out by acquiring 7500 pairs of x and y EEG channels signals from the publicly available Bern-Barcelona EEG database. The manual classification of each signal type was visually done by two board-certified electroencephalographers and neurologists. Initially, every channel signals are pre-processed using $z$ -score normalization and Savitzky-Golay filtering. The signals are used as inputs to a pre-defined Bi-directional LSTM algorithm for the training process. The classification is performed using a k-fold cross-validation following 4-, 6-, and 10-fold schemes. At the end, the performance of the algorithm is evaluated using several metrics with a complete summary table of the recent state-of-art studies in the field. The developed algorithm achieved an overall Cohen’s kappa $\kappa $ , accuracy, sensitivity, and specificity values of 99.20%, 99.60%, 99.55%. and 99.65%, respectively, using x channels and 10-fold cross-validation scheme. The study pave the ways toward implementing deep learning algorithms for the purpose of EEG signals identification in a clinical environment to overcome human errors resulting from visually inspection.


I. INTRODUCTION
Epilepsy has become a major neurological disorder of the brain that is characterized by the occurrence of repeated seizures. Normally, the brain sends electrical impulses to the whole body throughout neurons and neurotransmitters. In a case of an epileptic seizure, these electrical waves are disrupted resulting in an imbalanced reactions from the body [1]. According to the World Health Organization (WHO) [2], more than 50 million people around the world are suffering from epileptic seizures, where as 80% among them are living in low-and middle-income countries. Despite the development of anti-epileptic medications, 33% of epileptic patients The associate editor coordinating the review of this manuscript and approving it for publication was Siddhartha Bhattacharyya . do not respond to the medication and are expected to suffer from further unpredictable seizures in future.
Epilepsy is of two major types; focal and non-focal (generalized). In focal epilepsy, only a specific part of the brain is affected, i.e. one hemisphere of the brain. On the other hand, non-focal epilepsy affects multiple areas within the brain even though they were not affected directly by the seizures [1], [3]. The majority of epileptic patients (60%) become pharmacoresistant, that is not responding to medications. Therefore, they require surgical interference to treat the seizures [4]. As a result, the precise localization of seizure areas is important prior to the surgery to reduce the risks accompanied with invasive interference with the brain.
Electroencephalogram (EEG) is considered the gold standards in epilepsy diagnosis for its ability to detect the brain electrical activity as well as the presence of epileptic seizures VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ FIGURE 1. The complete procedure followed in the study.
and abnormalities [5]. EEG allows for obtaining focal and non-focal signals, thus, discriminating both epilepsy types and identifying the region that the seizure is originating from. In addition, EEG allows for the identification of both the ictal and inter-ictal seizure activities. The ictal EEG represent a continuous wave of spikes. On the other hand, inter-ictal EEG is represented by the presence of temporary sharp waves. Sometimes, clinicians use long-term intercranial EEG recording to detect deeper signals and localize the source of the seizure within the brain [3], [5]. Therefore, visual identification of focal and non-focal epileptic EEG signals is considered time consuming for medical doctors.
In addition, multiple experts may have different views for the patient EEG patterns. Thus, human error in diagnosing epileptic seizure source is of a high chance of occurring [6]. Thus, there is a major need for an automated classification approach that is able of distinguishing between the characteristics of such signals and assisting clinicians in their decision making process prior to surgical intervention. Due to the difficulty in visually differentiating between EEG signals in both epilepsy types, machine learning algorithms have become of a high importance in detecting differences and classifying such signals. Among these algorithms are the Least-Squares Support Vector Machine (LS-SVM) [7], [8], Adaptive Neuro-Fuzzy Inference Systems (ANFIS) [9], K-Nearest Neighbor (KNN) [10], and Bayesian Linear Discriminant Analysis (BLDA) [11] learning systems. However, these algorithms require manual feature extraction including the time-and frequency-domains features. In addition, features extracted in entropy and wavelet transforms are also utilized. Therefore, the use of a deep learning approach could ease the classification process, as it does not require manual feature engineering that usually requires continuous tuning due to the variable nature of EEG signals. Among these algorithms are the Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM), which has been used very recently in few studies for other EEG classification applications [12]- [14].

A. OUR CONTRIBUTION
In this paper, a deep learning approach following an LSTM algorithm is explored for the purpose of focal and nonfocal epileptic EEG signals classification ( Figure 1). The obtained EEG signals corresponding to both epilepsy types are acquired from an online database and processed for two channels to be described later in the paper; the x-and y-channels. Initially, the data-set is pre-processed for all signals by passing them to z-score normalization step followed by digital filtering. The training and classification process was applied using both channels through a pre-defined Bi-directional LSTM algorithm. For each channel, the LSTM classifier was trained following a k-fold cross-validation using 4-, 6-, and 10-fold schemes. At the end, the overall performance is evaluated using several performance metrics including the accuracy, sensitivity, specificity, precision, F1-score, Cohen's Kappa (κ), Matthews Correlation Coefficient (MCC), and Jaccard Index (JI). Each step is briefly explained in the upcoming sections.

B. PAPER ORGANIZATION
The paper is organized as follows. In Section II, the LSTM network architecture is described briefly with general background information. For Section III, the materials and methods used in the study are presented including the data-set selected, the pre-processing steps, the training and classification schemes, and the evaluation metrics definitions. For section IV, the results are provided along with a brief discussion on the observations. The paper is concluded with future work in Section V.

II. LONG SHORT-TERM MEMORY
In Recurrent Neural Networks (RNNs), chain-like structures are used to capture temporal sequences between the data. However, this cause problems when training data using back-propagating processes such as the exploding/vanishing gradient problems [15]. Therefore, LSTM networks were first introduced back in 1997 by Hochreiter and Schmidhuber [16] to store the long-term dependencies between data points. LSTM has been used for several applications including speech recognition, image detection, and language modeling [17], [18].
The architecture of the LSTM network includes memory blocks, which are the input (i), output (o), and forget (f ) gates, and a cell that manages the flow of information between the three surrounding gates. The input and output gates control the activation of the input and output information flow, and are described as follows, In addition, the forget gate controls the memory needed to be kept within the network, and is given as, where W x * is the input to gate weight, W h * is the hidden to hidden weight, W c * is the peephole weight, b * is a bias vector, and h t is the hidden layer output and is given by, where c t represents the input cell which can be defined as, Moreover, the output cell C t is described as, The input/output cells are connected to the gates by feedback sources with a Constant Error Carousel (CEC) that activates on each input entry to allow gradients to flow unchanged. The activation function is the sigmoid σ () bounded by (0,1). The previous equations describe the LSTM standard model in the forward chain using a hidden layer h t−1 . To illustrate the backward chain used in the Bi-directional LSTM (BDL-STM) networks, hidden layer h t+1 is utilized. This results in having the overall out as, where for all N levels of stack, − → h N and ← − h N are the hidden layers output in the forward and backward directions, respectively. A shortcut view of the BDLSTM is illustrated in Fig. 2, where each LSTM block contains all the gates described earlier.

A. DATA-SET
The data-set is obtained from Bern-Barcelona EEG Database; which is an intracranial EEG study of epileptic patients done at the Department of Neurology of the University of Bern, Bern, Switzerland [19]. Five patients with longstanding pharmacoresistant temporal lobe epilepsy were included in the study. All patients were enrolled for epilepsy surgery and EEG signal acquisition was part of the diagnostic procedure prior to the surgery. Three patients had a complete seizure freedom, two patients had auras with no post surgery seizures, and All five patients had successful surgical outcomes. For every EEG recording, multi-channel EEG signals were acquired using AD-TECH (Racine, WI,USA) intracranial strip and depth electrodes. The reference electrode was extracranial located between 10/20 Fz and Pz positions. All EEG signals were sampled at 512 or 1024 Hz based on the number of channels used (more or less than 64 channels). Each EEG signal was recorded for about 41.66 hours duration for the focal and non-focal seizure activity. Based on the intracranial recordings, brain seizures could be localized for all patients. Focal EEG signals were recorded using channels that detect the first ictal signal and were judged by visual inspection of two board-certified electroencephalographers and neurologists. The remaining EEG channels were used to record the non-focal signals. The EEG recordings were randomly divided into 3750 pairs of signals name as x and y.
For an x signal, one focal EEG channel was selected from any patient at random while for the corresponding y signal, one of this channel's neighboring focal channels was selected. The random selection of focal channels was done without replacement using a uniform random number generator. The same selection procedure was applied for the non-focal dataset. Prior to storing the EEG signals pairs, they were bandpass filtered (0.5-150 Hz) using a fourth-order Butterworth filter. In addition, they were visually inspected to insure that no artifacts took place within the recordings. It worth mentioning that no clinical selection criteria was applied such as the absence of presence of epileptiform activity. A sample of each of the focal and non-focal EEG recordings pairs from the x and y channels are shown in Figure 3.

B. DATASET PREPARATION
In this study, all signals pass through an algorithm to enhance their features prior to running through the training and classification process. The steps are, • Z-score Normalization • Savitzky-Golay Filtering 1) Z-SCORE NORMALIZATION Z-score normalization is a common signal processing technique used to ensure a balanced view for the data. Usually, large trends in the signals dominates the smaller trends, thus, increasing the dynamic amplitude range of the signal. The normalization process yields signals of a mean (µ) of zero and standard deviation (σ ) of 1 by forcing all features to follow a normal Gaussian distribution based on the following equation, It is found in literature that signal normalization enhance the overall classification process [20], [21].

2) SAVITZKY-GOLAY FILTERING
Savitzky-Golay (SG) filter is a finite impulse response filter commonly used to enhance the overall precision of the data [22]. Through convolution mechanism, the input signal is smoothed with a higher Signal-to-Noise (SNR) ratio. For each signal, a set of polynomial coefficients are obtained using a least-square method. These coefficients are convoluted at each data point of the input signal to produce an enhanced smother signal [23]. The coefficients are usually obtained by fitting a low order polynomial to the input data following this equation, where Y j is the smoothed output signal from the input signal nx j , y j at every j = 1, 2, . . . , n points, x is the independent variable, y is the dependent variable, and C i is the set of m polynomial coefficients.

C. TRAINING AND CLASSIFICATION
The training and classification process was performed using both the x and y EEG channels and following a k-fold crossvalidation scheme. The process was applied using 4-folds, 6-folds, and 10-folds to ensure several splits of the data when performing the training process. The LSTM network architecture selected was the BDL-STM. In addition, five layers were used in the training network including input sequence, fully connected, softmax, and classification layers. The sequence length of the data was 10240 representing the number of elements. Furthermore, a fully connected layer of two classes was used to provide the output sequence to the softmax and classification layers.

D. EVALUATION METRICS
The evaluation metrics selected in this study include the accuracy, sensitivity, specificity, precision, and F1-score, and are defined as follows, where true positives (TP) represents the total number of signals being classified correctly, true negatives (TN) represents the total number of the other signals being classified correctly as other signals, false positives (FP) represents the total number of signals classified incorrectly, and false negatives (FN) represents the total number of other signals classified incorrectly as other signals. Furthermore, to evaluate the agreement between the observed and the expert classifications, Cohen's Kappa (κ) coefficient [24] is used following this equation, where P 0 the observed agreements and P c represent the agreements expected by experts. To elaborate more on the observations of the LSTM classifier, the Matthews Correlation Coefficient (MCC) is a measure introduced by Matthews [25] in 1975 to measure the quality of binary classification, and it is calculated based on the following equation, In addition, the Jaccard Index (JI) [26] is included among the evaluation metrics and is often introduced to measure the similarities between two datasets or two classification observations. It is formulated as,

A. CURRENT OBSERVATIONS
The proposed method was implement using MATLAB, where each of the 7500 EEG records were pre-processed as mentioned earlier. A sample from the non-focal EEG signals is shown in Fig. 4 before and after the pre-processing steps (normalization and SG filtering). The SG digital filter uses a window of length 25, averaging each 25 samples with a polynomial order of 3. The selected of these parameters was done after several trial and error tests. To evaluate features between the two signal groups (focal and non-focal), Pearson correlation coefficients were calculated between two random recordings using the x and y EEG channel data and were observed as 0.01 and 0.04, respectively. These signals are used as an input for the BDLSTM network, where 10 cells are used to extract corresponding signals features [27]. Each time step is considered a feature to be utilized in the LSTM training network. Each cell includes a forward and a backward input/output stream as described earlier in Section II. At each time steps, having a 10240 time steps, the outputs of both streams are element-wise multiplied to obtain corresponding 10-dimensional representation, which are concatenated and fed to a 2-dimensional fully connected layer followed by a softmax activation function and a classification layer for predictions. The process of this feature extraction/learning process is illustrated briefly in Fig. 5.
The overall performance of the developed algorithm in classifying FEEG and NFEEG signals using x-channels.

TABLE 2.
The overall performance of the developed algorithm in classifying FEEG and NFEEG signals using y-channels. For the LSTM network settings, the Adaptive Moment Estimation (ADAM) solver was used as the LSTM optimizer for its ability to provide adaptive learning rates [28]. The solver parameters were 0.001, 0.9, and 0.999 for the learning rate (α), β 1 , and β 2 , respectively. The training network is selected to have a total number of 30 epochs with a minibatch size of 50. The selection of these parameters was done after several trials. Table 1 shows the overall performance of the developed algorithm in classifying the two EEG signal categories using the x-channels. The best k-fold scenario was the 10-fold cross-validation, as it covers more data in the training process. In addition, the Cohen's kappa (κ) value reached 99.20% with an accuracy of 99.60% in classification for both classes. All other metrics were higher than 99.00%, which reflects high sensitivity and precision. On the other hand, Table 2 shows the overall performance using the y-channels. A Cohen's kappa (κ) of 98.80% was observed with an accuracy of 99.40%. The values of the MCC and JI were close to the Cohen's kappa (κ) values following the two channel signals in 10-fold  cross validation. This suggests strong similarity between the original expert classes and predicted classes using the LSTM classifier.
To elaborate more on the observations, the confusion matrix of the classifier when using the x-channels (highest κ value) is shown in Table 3. The TPs of each class is shown in the diagonal boxes. The LSTM algorithm successfully classified 3737 signal as focal and 3733 signal as non-focal. On the other hand, the algorithm wrongly classified 17 signals as non-focal and 13 as focal. For the confusion matrix of the y-channels shown in Table 4, the observations is close to the classification process of the x-channels, however, the number of correctly classified classes is less. Both tables provided comparable results which suggest the possibility of using both groups in the classification process. In addition, the high value of correct predictions of the LSTM classifier shows efficient outcomes in the discrimination between focal and non-focal EEG signals.
In addition, the Receiver Operating Characteristic (ROC) curve and the corresponding Precision-Recall plot are shown in Figure 6

B. STATE-OF-ART STUDIES
To investigate the observations found in this study relative to other studies, several researchers implemented machine learning algorithms for the purpose of focal and nonfocal EEG classification. Table 5 provides a brief summary of the recent research works in this area. All researches covered in the table utilized the famous Bern-Barcelona database [19] described in section III. The table shows different machine algorithms including Adaptive Neuro Fuzzy Inference System (ANFIS), Least-Squares Support Vector Machine (LS-SVM), Optimized SVM, and regular SVM. In addition, a couple of these research works required further decomposition of the EEG signals using Dual Tree Complex Wavelet Transform (DT-CWT) and Bivariate Empirical Mode Decomposition (BEMD). The proposed algorithms did not require features extraction or signals decomposition step prior to the training and classification process. The current study showed close to literature observations with a slightly higher values in the averaged overall performance. The overall accuracy of the algorithms have reached 99.60% with high sensitivity of 99.65% and specificity of 99.55% using the x-channels and 10-fold cross-validation scheme. It worth noting that the studies covered in the summary table did not use deep learning algorithms and required manual feature extraction from the EEG signals.

V. CONCLUSION
In this paper, a study is conducted to investigate the use of an LSTM classifier to distinguish between focal and non-focal EGG signals of patients with epilepsy. The study showed high levels of accuracy of 99.60% in the classification process using a 10-fold cross-validation scheme. The higher accuracy was obtained when using x-channels and showed a high agreement with experts classification of 99.20%. Both the x and y EEG channels provided comparable results and suggest the use of both channels to discriminate focal and non-focal EEG signals. The study suggest LSTM as a potential deep learning algorithm in clinical EEG signals identification. Future work includes improvements on the architecture of the network with further testings on different epilepsy data-sets.