Discrimination of Types of Seizure Using Brain Rhythms Based on Markov Transition Field and Deep Learning

Discrimination of types of seizure using the Electroencephalogram (EEG) signal has always been a challenging task due to minuscule differences among different types of seizures. In this regard, deep learning (DL) which has already evidenced notable performance in image recognition could be suitable. However, a few attempts have been made so far in this regard mainly by constructing 2D input images for DL from 1D EEG signals directly using various techniques. Besides, the quality of the generated images has not been verified. Therefore, in this work, 2D images for the DL pipeline have been generated from brain rhythms, which already displayed remarkable performance in analyzing various brain activities. For this purpose, the Markov transition field transformation technique has been employed for 2D image construction by preserving statistical dynamics characteristics of EEG signals, which are very important during the discrimination of different types of seizures. And, a convolution neural network (CNN) has been used for classification. Further, the quality of the 2D image along with appropriate brain rhythms have also been investigated. For experimental validation, EEG recordings of six different types of seizure that are provided by the Temple University EEG dataset (TUH v1.5.2) has been taken into account. The proposed method has achieved the highest classification accuracy and weighted ${F}1$ -score up to 91.1% and 91.0% respectively. Further analysis shows that higher image resolution can provide the best classification accuracy. In addition, the $\delta $ rhythm has been found the most suitable in seizure type classification. In a comparative study, the proposed idea demonstrated its superiority by displaying the uppermost classification performance.


I. INTRODUCTION
E PILEPSY is a neurological disorder that occurs due to abnormal electrical activities of neurons in the brain [1], [2], [3], [4]. It is described by a quick surge of unusual electrical activities in part or all of the brain regions, known as epileptic seizures which can be tracked by EEG signals. In clinical diagnosis, EEG signal has been considered a reliable and portable method for analysis and detection of epileptic seizures [2], [3], [4]. Indeed, such detection has been improved by involving machine learning based methods and numerous methods have already been proposed [1]. However, a few attempts have been made for discriminating different types of seizures, which is essentially important for accurate diagnosis followed by in selection of appropriate drugs [5], [6], [7], [8]. Certainly, traditional machine learning whose performance is mainly dependent on predefined feature selection might not be suitable as minuscule differences exist among different types of seizures. In this context, deep learning (DL), which automatically selects appropriate features might be suitable [2], [3]. Being a data-driven method, it requires proper input data preparation to achieve its best efficacy [7], [8]. Therefore, in this work, 2D input data from 1D EEG signals have been generated by the Markov transition field which could efficiently capture diminutive differences among different types of seizure and the classification has been performed by a convolution neural network (CNN) that is already evidenced notable performance in image-based classifications tasks. In the literature, various machine learning algorithms have been considered to discriminate different types of seizures. For instance, Saputro et al. [9], employed a support vector machine (SVM) along with principal component analysis with features such as Mel frequency cepstral coefficient (MFCC) and Hjorth descriptor from EEG signal to classify three different types of seizure and achieved accuracy up to 91.4%; Wijayanto et al. [10], discriminated four seizure types using statistical variants as features that were extracted from decomposed components of EEG and achieved 95% of classification accuracy; Kassahun et al. [11], classified two types of seizure by involving different machine learning algorithms and reached accuracy up to 77.8%. Shankar et al. [12], employed five different machine learning algorithms to classify three types of seizures along with seizure-free using statistical features which were extracted directly from raw EEG and got reasonable accuracy. However, the performance of these methods fully relies on how and what kinds of predefined features are chosen, which is not very recommendable as very small differences exist among different seizure types [5], [6], [7]. Besides, the nonlinear and non-stationary characteristics of the EEG signal make it more challenging [6], [7], [8]. In this context, DL-based algorithms might be suitable as it bypasses hand-crafted feature engineering and already have evidenced outstanding performance in image-based classification, including biomedical signals, and applied in seizure type classification [2], [3]. For instance, Cao et al. [13], classified three types of seizures by a hybrid deep neural network that combines squeeze-and-excitation networks (SENet) and long short-term memory (LSTM); Roy et al. [14], used CNN for eight types of epileptic seizure discrimination and achieved F1-score up to 72.20%; Ahmedt-Aristizabal et al. [8] classified seven seizure types using raw EEG signals as input where stacked auto-encoder, CNN, recurrent neural network (RNN), and hybrid network recurrent CNN (RCNN) were used for classification and achieved weighted F1-score of 94.50%.
In this view, the DL-based models, especially CNN have displayed remarkable performance in image classification and recognition [2], [3], [13], [18], [19], [20]. Such benefit of the CNN has been exploited in seizure type classification by constructing 2D input images from 1D EEG signals by several researchers. Raghu et al. [5], used different DL models including basic CNN, AlexNet, VGG16, VGG19, and GoogleNet for discriminating seven types of seizure and seizure-free, where the 2D input images were constructed by concatenating spectrograms of 1D EEG recording of different channels vertically. Asif et al. [6], proposed a saliencyencoded spectrogram approach to construct 2D images from 1D EEG which were directly fed into SeizureNet (which is the combination of several CNN blocks) and achieved F1-score up to 94.0%. Liu et al. [16], used short-time Fourier transform (STFT) to generate 2D input images of 1D EEG for hybrid bilinear architecture to classify eight types of seizures. Raghu et al. [17], considered transfer learning and pre-trained network for eight types of seizure discrimination and achieved accuracy up to 82.5%. Shankar et al. [15], proposed the Gramian angular field transformation technique for 2D image generation from 1D EEG signals and performed classification by CNN. In similar kinds of work [7], they generated 2D images by employing continuous wavelet transform (CWT), and classified six different types of seizures and seizure-free by a hybrid DL model consisting of CNN followed by LSTM. Indeed, transforming a 1D EEG signal into a 2D image becomes very important especially for the analysis of different seizure types by preserving uncertain changes, statistical, and dynamic transition characteristics of EEG signals for obtaining optimum system performance efficiently [5], [6], [7], [8], [21], [22], [23], [24]. However, the works did not address the issue and relevant properties of the generated 2D images which is very crucial while considering the inputs for the deep learning pipeline. Therefore, in this work, a new method of 2D image generation from 1D EEG signal has been proposed different aspects of the generated images have been explored.
In this context, the aforementioned works generated the 2D input images for CNN from raw EEG signals, which can be further enhanced by considering efficient transformation techniques and brain rhythms for 2D image generation. In EEG based analysis, five different types of brain rhythmsdelta (δ: 0.5 Hz-4 Hz), theta (θ : 4 Hz-8 Hz), alpha (α: 8 Hz-12 Hz), beta (β: 12 Hz-30 Hz), and gamma (γ : > 30 Hz) are considered for numerous brain activity studies very often including seizure analysis and found very appropriate [4], [24]. Therefore, in this work, the 2D input images have been generated from four brain rhythms (δ, θ , α, and β) have been used for the analysis of different types of seizures instead of direct EEG.
For seizure recognition, very long EEG recordings are used very often, which sometimes last more than hours where appearances of epileptic seizures are quite imprecise [22], [23], [24]. Such long EEG recordings are not always suitable for computation, rather consideration of the EEG segment might be more useful. Besides, it could fulfill the requirement of large and diverse input samples for DL-based classification tasks. In the literature, several researchers classify epileptic seizures using different duration of EEG segments -0.5 s, 1 s, 2 s, 4 s, 5 s, 10 s, etc. and have achieved remarkable performance [19], [20], [21], [22], [23], [24]. Therefore, in this work, EEG segments of a certain duration of EEG signal by preserving potential descriptions and characteristics of the original EEG signals have been considered for in-depth analysis and classification of different types of seizures.
As mentioned, several techniques including scalogram, gramian angular field, recurrence plot, etc. have been adopted to encode 1D time series to 2D images [18], [19], [20], [21], [22], [23], [24], [25]. However, these techniques fail to preserve the statistical and temporal dynamics transition characteristics of an EEG, which is very substantial during discriminating different seizure types. In this context, Markov transition field transformation (MTF) might be suitable as it preserves the statistical transition dynamics and temporal characteristics of 1D data and has been successfully employed in the 2D representation of 1D EEG signal [26], [27]. Its inverse nature property offers a pictorial exploration of hidden patterns efficiently [26]. Therefore, the MTF has been considered for 2D input image generation to categorize individual and potential features among EEG signals of different seizure types.
In this study, six different types of seizures have been discriminated by CNN where its 2D input images have been generated from segments of 1D EEG signal. The images have been generated from four brain rhythms by adopting the Markov transition field technique. Finally, generated images have been used for CNN-based classification. Several analysis including performance evaluation, image quality, suitable brain rhythms, and comparative evaluation has been performed. The contribution of this study can be summarized as: 1) Seizure type classification using brain rhythms in a deep learning framework. 2) Finding dominant brain rhythms for seizure type classification. 3) 2D input image encoding from 1D EEG signals of brain rhythms for in-depth features extraction. 4) 2D input image quality analysis while using it as input for DL-based classification. The rest of the paper has been organized as follows: Section II describes the proposed method. The experimental methodology has been detailed in Section III followed by results and discussion in Section IV. Finally, conclusions have been drawn in Section V.

II. PROPOSED METHOD
A system-level overview of the proposed idea has been displayed in Fig. 1. Firstly, the EEG signals have been pre-processed including noise, and artifact removal followed by the separation of different brain rhythms. Thereafter, EEG signals (brain rhythms) have been segmented for a certain length which is transformed into 2D images by MTF. Finally, generated images have been directly fed into the CNN pipeline for the classification of six types of seizures.

A. SIGNAL PREPROCESSING (BRAIN RHYTHMS)
Generally, recorded EEG signals consist of noise and artefacts which need to be removed and there are various methods available that can efficiently remove the same [1], [2], [3], [4]. In this context, the brain rhythms -delta (δ: 0.5 Hz-4 Hz), theta (θ : 4 Hz-8 Hz), alpha (α: 8 Hz-12 Hz), beta (β: 12 Hz-30 Hz), and gamma (γ : > 30 Hz) which are found very suitable in epileptic seizure analysis can be easily extracted by choosing bandpass filter of respective cut-off frequencies [24]. Next, the EEG signal with particular rhythms has been segmented to extract the in-depth features as the recorded EEG signals for epileptic seizure analysis are very long. Additionally, it benefits in DL-based classification which demands large and diverse input samples for effective classification. Certainly, such classification by segmentation has already been found very effective in seizure detection. However, the segment length may have different durations -0.5 s, 1 s, 2 s, 4 s, 5 s, 10 s, etc. [19], [20], [21], [22], [23], [24] which are empirically chosen. In this regards, the EEG recordings of different brain rhythms have been segmented into several pieces with a pre-defined span of 10 s. The segmentation has been accomplished by 50% overlapping between two consecutive segments to minimize information loss. After that, the segmented EEGs have been transformed into 2D images by involving MTF as detailed in the following section. The suitability of four different brain rhythms has been separately employed to observe their suitability in discriminating different types of seizures.

B. MARKOV TRANSITION FIELD (MTF)
The Markov transition field (MTF) is a 2D representation of 1D time series by representing the transition probabilities of discretized samples [26], [27], [28]. During 2D encoding, it preserves temporal behavior and statistical dynamic transition of time series effectively. Certainly, EEG signals can be considered time-series data, and such properties become very important while discriminating different types of seizures having minuscule differences in temporal behavior of EEG signals. Besides, MTF is simple and can efficiently display dynamic and temporal characteristics. Its precise inverse mapping property makes it more efficient by facilitating to visualization of the diverse 2D patterns [26]. First, the MTF technique discretizes the time series into a certain number of bins which are required to form a Markov transition matrix. The elements of the matrices can be regarded as transition frequency among bins, while its diagonal refers to self-transition. Certainly, the normalization of bins measured the transition probabilities among bins or states and represents in sequential order, i.e., preserving dynamic transition characteristics -known as the Markov transition field. This MTF can be formed as 2D images, which can be considered for deep learning input [27]. However, the resolution of the generated 2D image needs to be modified for effective computation [26]. Such modification can be handled effectively by resizing it using the blurring kernel approach. During resizing, it takes the average pixel values of each non-overlapping patch. Mathematically, let a time series, S = s 1 , s 2 , s 3 . . . s n }, with n sample points and N number of bins form a Markov transition matrix. Now, the choice of N is very crucial for the detailed visualization of hidden dynamic transition patterns of time series. For the optimal number of N, the bin width (B) (1) has been calculated by the Freedman-Diaconis method [29], where, IR represents the interquartile range of S.
Next, N has been measured by (2); Now, each point of the time series has been dispersed into corresponding bins (b i , b j ), where i, j∈ (1 to N). Certainly, the S becomes a weighted transition matrix (T) of size N×N. Actually, the matrices measure the transitions among bins as displayed in (3), where m i,j is the transition probability of a sample in b j followed by a sample in b i . Now, after normalization of the matrix by m i,j = 1, T becomes the Markov matrix. However, T ignores conditional relations among samples of S along with temporal order. Thus, after the normalization of T, which indicates transition probabilities and represents along with the temporal order -referred to as Markov transition field M (4).
In M, the transition probability of a point from b i to b j is denoted by M i,j . Therefore, M i,j , infers the transition probability of a point s p (s p ∈ b i ) to s q , s q (s q ∈ b j ) where (p, q) ∈ (1, n) in the temporal order, which indicates the temporal dependency. Certainly, M i,i , i.e., the diagonal elements describe the self-transition probabilities. Further, matrix M can be regarded as a 2D image of transitions probabilities of discretized sample points of time series in temporal order, which could help to track the minuscule differences among very similar kinds of signals with tiny variations. Such small differences present in EEG signals among different types of seizures and the MTF could be suitable in this regard.

C. 2D INPUT IMAGE
Next, the generated image from M has been reformed by reducing the size to improve the computation of the DL model. [26], [27], [28]. For this purpose, the blurring average kernel technique has been employed which is simple but very efficient. Basically, it reduces the size by taking average values of non-overlapping k×k patch with average kernel {1/k 2 } k×k . It accumulates the transition probabilities of each subsequence of size k collectively and provides better visualization of patterns and transition statistics of time series [26]. In seizure type classification, the size of the input images may influence, which should be examined. For this purpose, empirically three different image sizes have been examined.

D. DEEP LEARNING
The DL model consists of several processing layers to learn and extract relevant features from input data [2], [3]. In this work, a convolution neural network (CNN) has been considered for classification, which extracts suitable features from input images automatically by sharing parameters and connection sparsity [18]. However, the number of layers plays a crucial role in CNN which should be optimally selected; as a large number of layers may extract effective features at the cost of computational complexity; on the other hand, very fewer layers may fail to find appropriate features [19], [20], [21], [22], [23]. In this work, the proposed CNN has been designed considering five hidden layers, two affine layers, and one output layer as displayed in Fig. 2. Each hidden layer is a stack of convolution, pooling, and dropout layers, which actually guides to learn and extracts relevant features automatically. In the convolution layer, the convolution operation is performed by a pre-defined kernel, shifts by a pixel map over the input matrix, and extract numerous features. Next, the outcome passes through a non-linear activation function by a pooling layer, which improves learning ability and robustness. The pooling operation reduces the size of the feature dimension. In this work, rectified linear unit (ReLU) and max pooling have been chosen as activation functions and for pooling operation respectively. The fully connected affine layers finally perform the classification based on the generated features by previous layers. At the last layer, the softmax activation function has been used for final classification. All parameters of the proposed CNN pipeline have been detailed in Fig. 2.

A. DATA
For validation of the proposed idea, the Temple University Hospital, EEG dataset (TUH v1.5.2) has been considered [30]. In this dataset, EEG recording has been conducted by two unipolar montages approaches-Average Reference (AR) and Linked Ears Reference (LE) [31]. For AR, a certain set of electrodes are used as a reference, while the LE adopts a lead connector to join either right and left ears as reference. In this work, LE reference has been taken into account as it provides a more steady reference point with minimum artifacts [31]. In LE unipolar montage approach, the EEG signals were recorded with a sampling rate of 250 Hz and 16-bit resolution. In this study, common 19 channels -FP1_le, FP2_le, F3, F4_le, F7_le, F8, C3_le, C4_le, O1_le, O2_le, P3_le, P4_le, Pz_le, T3_le, T4_le, T5_le, T6_le, Cz_le, Fz_le, and Pz_le EEG recording have been used. A brief description of the EEG data has been summarized in Table 1, in which seizure types and corresponding recording duration have been depicted in the first and second columns respectively. In total, EEG recordings of 40 patients have been used. Certainly, regarding the dataset imbalance issue, recordings of different seizure types have been selected in equal proportion to some extent for further processing.

B. EXPERIMENT SETUP
The EEG recordings of different seizure types are free from artefacts. So, the signals are directly used from extraction for four different brain rhythms -δ, θ , α, and β by using a fifth-order Butterworth band pass filter with concerned cut-off frequencies. Thereafter, the whole EEG signals of different seizure types have been segmented with the duration of 10s with a 50% overlap. The EEG segment of a channel, Cz, of δ rhythm of different types of seizures has been displayed in Fig. 3. Further, for each EEG segment, 2D images have been generated by MTF transformation. Next, the images have been modified for three different resolutions -32×32, 64×64, and 128×128 have been generated by using the blurring simple average kernel technique for measuring the optimum image quality. Besides, the adoption of such a step could reduce computation and speeds up the training process. The encoded 2D images from EEG recording of channel Cz of δ rhythm of CPS, GNS, and TCS have been shown in Fig. 4. Now, for training and testing of the CNN model, the dataset has been randomly split into 80:20.
In addition, 10% of training samples have been used for the validation of the model. The Adam with primary (β1 = 0.9) and secondary moment estimation (β2 = 0.999) with decaying rates of 1×10 −06 and categorical cross-entropy have been considered during the training of the model. In addition, the learning rate, number of epochs, and batch size have been set to 0.1, 100, and 128 respectively for all classification tasks. The performance of the model has been evaluated by considering two parameters -accuracy (η) and weighted F1-score (F1) [7], [18]. Indeed, η sums up how well a model performs across all categories, and it is useful when all the classes are significant. Besides, F1 is very much important for the analysis of biomedical data, when all sets of data are not in equal proportion [19], [20]. Compared to η, it gives a more precise evaluation of instances that are wrongly identified [21], [22], [23]. The η (6) and F1 (7) of different classification tasks achieved by the proposed CNN model have been measured, where, T p and T n , refer to true positive and negative respectively, whereas F p and F n depict false positive and negative respectively.

A. TRAINING AND VALIDATION
The training performance of the model has been evaluated by measuring training (T η ) and validation (V η ) accuracy. In  , T η and V η with their corresponding losses have been displayed for input images with 128×128 resolution, where bold and dotted lines indicate the accuracy and loss, respectively. And, the left and right vertical axis display the training-validation accuracy (T η -V η ) and loss (T l -V l ) respectively. As seen, the training performance is consistent and improves significantly with the increasing number of epochs. Further, the training-validation accuracy (T η -V η ) and loss (T l -V l ) become steady near 100 epochs. Hence, empirically, the model has been trained with 100 epochs and a batch size of 128.

B. IMAGE QUALITY EVALUATION
To verify the image quality, the input images with three different resolutions of 32×32, 64×64, and 128×128 have been examined individuality by measuring η and F1 scores. For different brain rhythms, the results have been displayed in Fig. 6 in which the vertical axis represents η and F1 scores for different brain rhythms as indicated by the horizontal axis for three image resolutions separately. As seen, in Fig. 6, the highest performances were achieved for the input image resolution of 128×128 for all brain rhythms. Further, the classification performance becomes the highest for δ rhythm up to with η by 88.7%, 89.1%, and 91.2% for image resolution of 32×32, 64×64, and 128×128 respectively. The results show that the increase of image resolution improves classification accuracy. Indeed, high image resolution could increase the computation. Therefore, optimization needs to be done following the overall system objective.

C. BRAIN RHYTHMS ANALYSIS
It is important to evaluate the dominant brain rhythm in discriminating different types of seizures. For this purpose, all brain rhythms have been individually considered for seizure type discrimination using 128×128 images and respective η and F1 have been measured. The results have been displayed  in Fig. 6. As seen, the δ rhythm reached the highest classification η and F1 score ≥ 85% compared with other brain rhythms. Hence, the δ rhythm can be used for suitable seizure type discrimination. Further, analysis has been performed by considering the δ rhythm. Now it is important to know if any specific type of seizure is influencing the overall classification performance. For this purpose, different combinations of seizure types (excluding one) for δ rhythm with 128×128 image size have been analyzed and results have been summarized in Table 2, where first and second columns represent different combinations of seizure types and performance metrics respectively. As seen, maximum classification η and F1 have been achieved by the proposed model during the classification of ABS, CPS, GNS, MYS, and TCS (third row), i.e., FNS may have some similar characteristics with other types of seizures; in contrast, minimum classification performance for ABS, CPS, FNS, GNS, TCS (fifth row) without MYS. Therefore, during seizure type discrimination, FNS and MYS types of seizures should be carefully handled. However, all the classification results are very consistent which also validates the efficacy of the proposed model.

D. COMPARATIVE ANALYSIS
Finally, a comparative study has been performed with recently conducted works and the results have been summarized in Table 3, in which the first and second columns represent the related works followed by their respective methods, proposed model, and classification performance metrics respectively. As seen, the proposed idea displays the highest classification accuracy along with an F1-score. The results clearly show that the proposed idea offers the best classification performance in terms of accuracy and F1-score.

V. CONCLUSION
In this study, six different types of seizures have been classified by CNN where its 2D input images have been generated from 1D EEG signal. For this purpose, four brain rhythms, δ, θ , α, and β have been taken into account. For 2D input image generation, the Markov transition field has been employed, which preserves the temporal and dynamics statistical transition of EEG recordings. To check the optimum image quality, three different image resolutions -32×32, 64×64, and 128×128 have been taken into consideration for the analysis of seizure types. The proposed idea has been verified by considering the Temple University EEG dataset (TUH v1.5.2). The proposed methods can achieve the classification accuracy and weighted F1-score up to 91.1% and 91.0% respectively. Further analysis shows that the images with high resolution could better classification performance. In addition, the δ rhythm has been found very suitable for seizure type classification. In comparative evaluation, the proposed method demonstrated its superiority by displaying the best classification performance. Such a framework can be extended to other domains of EEG signal analysis.