Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000

Medical experts employ electroencephalography (EEG) for analyzing the electrical activity in the brain to infer disorders. However, the time costs of human experts are very high, and the examination of EEGs by such experts, therefore, accounts for a plethora of medical resources. In this study, an improved one-dimensional CNN-only system of 25 layers has been proposed to identify abnormal and normal adult EEG signals using a single EEG montage without using any explicit feature extraction technique. Most of the previous systems based on deep learning, that have been proposed to solve this problem, use extremely deep architectures containing very large numbers of layers. This study also presents an independent preprocessing module that has been exhaustively evaluated for optimal parameters with the target of adult EEG signal classification. The achieved accuracy of the proposed classifier as a part of the decision support system is 82.24%, which is a substantial improvement of ~3% over the previous best reported classifier of comparable depth. The system also exhibits significantly higher F1-score and sensitivity as well as lower loss. The proposed system is intended to be a part of an expert system for overall brain health evaluation. INDEX TERMS Electroencephalography, Machine Learning, Decision Support Systems, Convolutional Neural Networks, Data Preprocessing, Savitzky-Golay Filtering, Optimization


I. INTRODUCTION
Electroencephalography (EEG) is the most pivotal diagnostic technique used for the detection of brain-related conditions. It can be used to distinguish between and analyze conditions such as syncope, migraine variants, epileptic seizures, psychogenic non-epileptic seizures, subcortical movement disorders, encephalopathy, and catatonia. Along with being a technique targeted for such brain analyses, one of the primary advantages of EEG is that it is a non-invasive method that registers electrical activity using wet or dry electrodes attached to the scalp [1]. Hence the electrical activity is detectable on the skin surface in the form of electric potential.

Signal acquisition
For standard EEG acquisition, approximately two to a few dozen (wet or dry) electrodes are placed on the subject's scalp in standard positions. Voltage fluctuations, usually in the order of tens of microvolts (μV), are then measured between selected pairs of electrodes and highly amplified [1]. Thus a typical EEG recording is a set of many irregular simultaneous tracings, indicating changes in voltage between pairs of electrodes. Due to the non-invasive mode of signal acquisition, there is a possibility of a significant amount of noise further causing a reduction in signal quality. As a result, the EEG signal is generally considered impoverished [2]. This nature of the EEG signal adversely affects the capability to differentiate between the various underlying medical conditions associated with the characteristics of the signal. Therefore, it poses a major challenge to the human expert for the diagnosis of the subject. This challenge is one of the primary motivations of the present work.

EEG as suitable data
The biomedical relevance of EEG signals for a human expert is indubitable. However, working on developing an automated system for the classification of EEG signals demands a rationale to justify the use of these signals as 'suitable' data. From the data perspective, recording the brain's electrical activity using an electroencephalogram is one of the primary methods of data acquisition from the human brain. Furthermore, it exhibits excellent temporal resolution, which supports the argument of using the signals as inputs for an intelligent system that can infer abnormalities at the neural level by analyzing the signal in the time domain.

Normal and abnormal adult EEG signals
For adult EEGs, normality and abnormality have specific semantics. They are defined differently for different age and gender groups. The two fundamental tasks carried out by a human expert to analyze an EEG signal of a subject for normality are (1) Background Recognition; and (2) Transients Recognition. The usual background characteristics can be classified as follows.  (8-10 Hz), usually visible in 17% -19% of adults Out of the above characteristics, the most relevant characteristic for a human expert for abnormal EEG identification is the alpha rhythm [3]. It is also known as the Posterior Dominant Rhythm (PDR). It emerges in the occipital region of the brain when the eyes of the subject are closed. When the subject enters the state of drowsiness, then this rhythm fades away. The evaluation of an EEG as abnormal or normal is largely dependent on the analysis of the reactivity observed in the EEG during the emergence of this occipital alpha rhythm. Therefore, the presence, frequency, and distortion of PDR in EEG are considered the primary criteria for an EEG to be classified as normal or abnormal [2].

Medical interpretation of an abnormal EEG
An abnormal EEG can correspond to various conditions, including, but not limited to, focal slowing or generalized slowing caused by conditions such as epilepsy, delirium or encephalopathy, dementia, and coma [4]. For biomedical interpretation, exact association is assumed between an abnormality in the waveform of an EEG and a corresponding brain condition [5]. These conclusions consider specific signal characteristics such as sharps, waves, and spikes that are indicative of a known abnormal pattern but are difficult to observe for a novice EEG interpreter.

Problems in manual interpretation
Concluding abnormality in an EEG is not straightforward. For example, in one case, the abnormal nature of an EEG can be inferred on the basis of long and clear periods of wave and spike activity in the signal called epileptiform discharges. However, the mere presence of such activity is not a sufficient indicator to conclude abnormality. This is because the morphologically concluded epileptiform may be a benign variant not necessary linked with epilepsy [3]. A more common issue with EEGs is the amount of external background noise and internal natural noise, which further makes manual interpretation difficult. Another challenge in the interpretation of abnormal EEGs is the coverage of all kinds of abnormalities [4].
Therefore, any decision support system aimed to identify abnormal EEGs must interpret benign variants, eliminate different types of signal noise, and cover variation of signals corresponding to different abnormalities.  Table I. Deep convolutional neural networks have been used to solve the problem of automated detection of Parkinson's disease using EEGs [15] with an accuracy of 88.25%. A range of machine learning methods have also been previously applied for the automated detection of Alzheimer's disease using EEGs [7] [8] [16]. A previous study [14] has used a deep CNN to develop a system for automated screening of depression using EEGs that has reported an accuracy of 95.49%. Machine learning methods have been used to detect epilepsy using EEGs in some previous works [9] [10] [17]. Another diagnosis that is universally done using EEGs is the detection of seizures.

B. RELATED WORKS
Deep CNNs and complex mathematical features such as wavelet and Fourier features have been used to detect seizures automatically [11] [12]. The deep CNN solution that uses wavelet features to classify EEG signals as epileptic or normal [6] is 88.7% accurate. Three dimensional optimized convolutional neural networks have also been used for epileptic seizure prediction [18]. Pyramidal 1D CNNs have also been used for the detection of epilepsy [13] with an accuracy of ~ 99%.
The use of CNNs for automated identification of EEG signals associated with any abnormality has also been attempted with a test accuracy of 79.34% on the TUH-EEG Abnormal dataset in [19]. The model used in [19] comprises 24 layers. The best CNN-only solution presented in [20] for abnormal EEG identification on the same dataset gives an accuracy of 76.90%. Although there have been very recent developments in the systems for abnormal EEG identification that present accuracies in the range of 85% to 97% [21], the numbers of layers used in these deep learning architectures are VOLUME XX, 2021 3 in the order of hundreds. Therefore, these systems suffer from high computational load even during testing. The system proposed in this study improves the accuracy of the system presented in [19] by ~3% with minimal change in the number of layers. Table I also shows a survey of existing work on similar systems that have been proposed for the identification of EEG signals associated with all abnormalities [19] [20]. The present work uses the same dataset as [3], [19] and [20]. The performance metrics of the proposed system show an improvement over these systems.

A. DATASET
The TUH-EEG Abnormal dataset [3] [27] was used in the current study. This dataset is a subset of the TUH-EEG Corpus [28]. At the time of this study, this corpus [28] was the largest publicly available dataset of EEGs. The distributions of the dataset, with respect to number of patients and number of EEG sessions, are shown in Tables II, III, and  IV.

B. PROPOSED SYSTEM
The proposed system comprises (i) a preprocessing module, and (ii) a 1D convolutional neural network. The modules of the system are described in the following subsections. The complete system is depicted in Fig. 1.

1) PREPROCESSING MODULE
The EEG signals have been preprocessed using a six step process before being fed to the CNN. Fig. 2 depicts the components of the preprocessing module.

FIGURE 1. Proposed system for the automated identification of abnormal EEG signals
Step 1 -Montage Selection The raw input signals contain 24-36 channels, out of which the montage of the posterior temporal to occipital (T5-01) channels has been selected for classification. The studies in [19] and [21] have supported the selection of this montage for automated identification of abnormal EEG signals. Moreover, the T5-01 montage has also been recommended as one of the standard longitudinal bipolar montages to be used for clinical purposes [29].
Step 2 -Segmentation Each EEG recording in the dataset has been segmented for the first 90 seconds. This segment duration has been determined through a comparative analysis carried out by training the 1D-CNN for segments from 5 to 300 seconds by only changing the dimensions of the input layer of the model. Therefore, each instance of the dataset consists of 22500 frames.
Step 3 -Trimming Initial frames of most EEG signals in the dataset consist of undesirable extrinsic artifacts. Therefore, the first 1250 frames (trimming threshold = 5 seconds) of each segment have been cut. This trimming threshold has been determined using a comparative analysis of model accuracies by regulating this threshold from 0 through 80 seconds (20000 frames).
Step 4 -Min-Max Scaling Each EEG in the dataset may have different minimum and maximum values from the others. This causes a significant variation in values across the dataset. The EEG data has been normalized to the range [0, 1] to overcome this variation by using min-max scaling. A significant advantage of this scaling is that the scaled sample values are limited to a range that allows comparatively more straightforward computations using smaller numbers during further processing.
Step 5 -Standard Scaling Suppose the EEG signals, as input data to the CNN, are not standardized. The performance of the model is likely to worsen in that case, as the data is not similar to the standard normally distributed data. Therefore, the EEG signals have been standardized by centering the samples around their mean by subtracting the mean from the sample values and scaling them to unit variance.
Step 6 -Savitzky-Golay Filtering The Savitzky-Golay (SavGol) filtering technique uses simplified least square procedures to smoothen a signal [30]. It uses convolution to fit successive subsets of adjacent sample values using an N-degree polynomial. Smoothing involves using the N-degree polynomial to approximate or smooth W sample values, where W is the length of the filter window. Therefore, the number of convolution coefficients used for smoothing is W. This technique increases the signalto-noise ratio without significantly distorting the local behaviors of sample values in the signal. A previous study [31] has compared multiple smoothing techniques and supported the usage of this filtering technique to smooth EEG signals. When different models are fit to the smoothed signals and tested for classification, SavGol filtering has shown the best performance. VOLUME XX, 2021 5

2) PROPOSED 1D CNN
The proposed convolutional neural network has 25 layers, as depicted in Fig. 3. It includes 1D convolution layers, maxpooling layers, dropout layers, batch normalization layers, a flatten layer, two dense layers, and a final softmax layer for classification. Since valid padding is used in all the convolutional and pooling layers, the right edge inputs that are not covered by the respective filters at these layers are dropped. Other than the softmax activation used for binary classification at the last layer of the model, all the default activations used in the model are rectified linear activation functions in the form of rectified linear units.
The input layer of the model takes the output of the preprocessing module as its input. The first 1D convolutional layer applies eight filters of kernel size equal to 23 on the input signal. An interval of 3 is used as the stride for the application of filters in this layer. Since a large amount of redundant information is present in the 1D feature maps after applying the first convolution layer on the input signals, an initial dropout layer is used to randomly set the input units to zero with a frequency equal to 0.2. The rest of the inputs are scaled up by a factor of 1.25. This ensures that the sum of all inputs remains unchanged. The next layer in the proposed model is a 1D max-pooling layer. A pool size of 2 ensures that the maximum values in 2-sized windows are selected as the representative values of the window. Therefore, the maxpooling operation provides reduction in the size of the feature map by half. The filter window in this layer has a stride of 2.
A batch normalization layer is used next to reparametrize the model by normalizing the output using the standard deviation and mean of the input batch. It removes the harmful effects of internal covariate shift [32]. Another 1D convolutional layer applies 64 filters of kernel size equal to 13 on the inputs with a stride of 1. It is followed by a max-pooling layer that halves the size of the feature map. Applying convolution followed by max-pooling in a deep CNN derives the relevant features through convolution layers, intensifies the features through max-pooling and discards the information that makes lesser relevant features for classification. Two 1D convolution layers and a max-pooling layer are used next for further feature extraction with the same rationale of discarding some amount of irrelevant information. Layer-wise activation maps show the presence of redundant data in the 1D feature maps after applying the previous layer. Therefore, another dropout layer with the same parameters is used. Five 1D convolution layers are used next in the network for feature extraction. A dropout layer then randomly sets the input units to zero with a frequency of 0.1, while scaling the rest of the inputs by a factor of 1.11. The next 1D max-pooling layer reduces the size of the feature map by half. Two 1D convolutional layers are then used that apply 32 and 16 filters of kernel size equal to 3 on the inputs with a stride of 1. They are followed by another 1D max-pooling layer. A second batch normalization layer normalizes the output using the standard deviation and mean of the input batch. The inputs are then flattened to prepare for the dense connections in the following layers. Another dropout layer ensures the removal of redundant information from the feature maps. The outputs of this layer are fed to a densely connected layer of 64 units to proceed towards the final classification. In this layer, the output is calculated using (1).

= (( , ) + )
Here I is the input, K is the kernel, B is the bias vector, A is the element-wise activation function, and O is the layer's output. The subsequent output is then fed into the last densely connected layer of 2 units. Finally, softmax activation is used to classify the input as either 0 or 1, corresponding to the target normal and abnormal classes of EEG signals respectively.

C. EXPERIMENTAL SETUP
This section describes the training, validation, and testing flows that have been used to perform experiments on the TUH-EEG Abnormal dataset. It also comprises the software and hardware setups that has been used for all the experiments.

1) PROCESS OF TRAINING, VALIDATION, AND TESTING
The standard process for accuracy estimation that has been used for training, validation, and testing sets is depicted in the flowchart in Fig. 4. For many optimizations, this process has been slightly modified. Description of modification in the experimental setup is provided with the description of the corresponding optimization in the Results and Discussion section.

2) SOFTWARE SETUP
In the present investigation, Python and Anaconda have been used for programming and creating virtual environments respectively. Deep learning tasks have been performed using TensorFlow and Keras libraries. Library functions from scikit-learn and scipy have been used for scaling and filtering operations.

III. RESULTS AND DISCUSSION
The computational times for different phases of the system, and the results of classification, parameter and hyperparameter optimizations, are reported in this section. In addition, results of the comparison of accuracies for different preprocessing techniques for EEG signal classification have been discussed. The performance comparisons reported in this study have been made with [19] as it is the best reported  Table I) that is trained on the same dataset.

Computational time
The computational times required for data loading and preprocessing, model training, and prediction for a single EEG sample have been reported in Table VI.

Optimization of size of the input signal
In two of the previous studies on the identification of abnormal EEGs [3] [19], a 60 seconds duration of the signal, corresponding to 15000 frames on a sampling frequency of 250 Hz, was considered. However, the rationale for selecting only the first 60 seconds was not specified. Therefore, the present study optimized the number of input frames used for classification by thoroughly evaluating the classification accuracies over a range of input signal durations (number of frames). Fig. 5 depicts the change in test accuracies when the size of the input signal was changed from 5 seconds (1250 frames) to 5 minutes (75000 frames). The results in Fig. 5 indicate that the system attains the peak value of the test accuracy when the number of frames as input is 22500.

Determination of the optimal Trimming Threshold
Inspection of 100 randomly selected EEGs from the original dataset revealed that few frames in the first quarter of the signal length had a visually observable noise. Therefore, the original signal was trimmed by a specific number of frames to obtain better accuracy. This number is referred to as the trimming threshold. Fig. 6 shows the model's training and test accuracies on increasing the trimming threshold from 0 to 20000 frames. The best accuracies were observed when the trimming threshold is 1250 frames, corresponding to 5 seconds of the initial signal thereby confirming our assumption. The trend of successive decline in the values of the training accuracies ( Figure 6(a)) after 1250 frames offers a solid foundation to conclude that the optimal trimming threshold in this study is 1250 frames.

Advantage of Smoothing
Smoothing of an input signal, in general, reduces the time it takes to train the model. The training time is reduced as smoothing removes noise, thereby causing accuracies to converge in fewer epochs. Fig. 7 compares the number of epochs required by the proposed model for convergence of training accuracy with smoothed and unsmoothed signals.

Selection of Savitzky-Golay Filtering
Smoothing of EEG or any biomedical signals, in general, is a sensitive process. The incorrect application of smoothing can filter a random amount of significant information required for classification. By traditional signal processing standards, the signal may become smooth, but crucial information essential for abnormal EEG detection may be lost. Therefore, it is critical to identify and compare reliable smoothing techniques for EEG signals. Analysis of different techniques reveals SavGol and median filtering as the most suitable methods [31]. These filtering methods also respect the local behavior of a signal within a window. They provide flexibility in choosing a window size as per the extent of the local behavior of the signal that needs to be preserved. Fig. 8 compares both the techniques for train, validation, and test datasets. Ideally the test accuracy should not be used as a criterion for selecting the hyperparameters of a machine learning model. However, performance metrics on the test data can be utilized for selecting the best smoothing technique during preprocessing. Based on the overall comparative results (Fig. 8) and the flexibility of the filter to accommodate local behavior, the SavGol filtering technique can be concluded as the better technique.

Parameter optimization for the Savitzky-Golay filter
Window size and polynomial order are the two primary parameters of a SavGol filter. They were optimized by evaluating test accuracies over a range of window sizes and polynomial orders. Fig. 9 shows two 3D plots created to visualize the optimization. Fig. 9(a) is a scatter plot while Fig. 9(b) shows a mesh laid on top of the 3D points in the scatter plot. The mesh displays a generic trend of accuracies with changing window sizes and polynomial orders. The figure shows that the optimized values of window size (W) and polynomial order (N) for the highest accuracy over the test set are 9 and 51 respectively. It depicts an initial drop in accuracy when the filter is applied with small values of polynomial order and window size. But when the polynomial order approaches 7, a sudden increase in the accuracy of the proposed system is witnessed. It is evident from the results that increasing the order of the polynomial beyond 9 does not yield better performance.
Convolution coefficients of the filters, calculated as per the optimized values of W (51) and N (9), are shown in Fig. 10. The figure shows a plot of the filter's coefficients applied to the signal during the preprocessing phase.

Effect of SavGol filtering
A brute force approach was applied to evaluate the effect of SavGol filtering on classification performance, by comparing the accuracies of the models trained on filtered and unfiltered inputs. For this comparison, the model architecture in [19] was used. Fifty random pairs of subsets were created in the 80%-20% ratio from the original training set to obtain fifty different training and corresponding validation subsets. The best accuracies were then calculated after training and validating the model independently over the 50 pairs of subsets. Finally, the average of the 50 best accuracies was calculated for training, validation and test data. In the present study, this performance metric has been termed as Average Accuracy. Fig. 11 depicts the comparison of average accuracies of the model when the input was filtered and unfiltered. A similar comparison of maximum accuracies was made to verify the contribution of SavGol filtering to create the most accurate version of the model. Again, the maximum accuracy was obtained when the model was independently trained and validated using the 50 unique training and validation subset pairs. Fig. 12 depicts the relevant results. The results indicate that SavGol filtering improves the overall accuracy of the model.

Hyperparameter Estimation
The number and types of layers, layer-wise hyperparameters and model configuration were determined by observing the values of validation accuracy for different values of hyperparameters, intuition from activation maps and inspiration from older models [19]. Intuitive selection of the initial values of hyperparameters for the design of the model was also based on the fact that according to human experts, alpha rhythm is the most significant background characteristic for the classification of an EEG signal as abnormal or normal. Therefore, values of filter sizes and strides were chosen such that the frequency of the alpha rhythm was not lost during feature extraction. The hyperparameters, the number of parameters learnt and the corresponding output shape for each layer are listed in Table VII. A total of 851,746 parameters have been used in the model. The number of trainable parameters out of these parameters is 851,698, while 48 parameters are nontrainable.  Table VIII. The parameters in the optimizer configuration and the type of loss function used were inspired by previous models [19].

Performance of the Proposed System
The proposed system of EEG classification has been evaluated against [19] as it is the most accurate model of comparable depth and complexity to the best of our knowledge. The average and maximum accuracies of these models were calculated and compared for the train, validation, and test datasets. The results are depicted in Fig.  14 and Fig. 15. Maximum values of precision, sensitivity, specificity, f1-score, classification accuracies, minimum values of losses for the two systems, and the number of layers and parameters in the models have been presented in Table  IX. Based on the reported performance metrics in the table, it can be concluded that the proposed system performs better than [19] except for a loss of 0.02 in precision. All other metrics have significantly improved.
Due to the multitude of levels and range of abnormalities a subject may suffer from and the minute nature of differences between them, the research in this area is challenging and inherits limitations of the domain. One such inherent limitation in EEGs is the amount of noise in the signal due to the non-invasive nature of the recording. The proposed system needs to process the signal further to provide more meaningful data inputs to the classification algorithms. The present study needs to be extended to include classification algorithms for specific kinds of abnormalities in EEG signals to further provide incisive assistance to medical experts. Like any other medical decision support system, the system inferring the abnormal nature of an EEG also needs to improve till it is 100% accurate. If the normality of an EEG signal is the positive outcome, then a false positive is a far more worrying medical outcome than a false negative. More research is required to decrease the false positive rate and increase the specificity of the system.

IV. CONCLUSION
The development of classification models for determining the medical state of a human subject using physiological signals is a thriving domain in current research. Minor improvements in the accuracies of such models are relevant as they help improve the quality of life. However, most previous studies on solving similar problems for physiological signals such as EEGs do not emphasize the preprocessing of signals. The present work underlines the significance of the preprocessing phase by suggesting an independent module for it. Furthermore, we have systematically selected and optimized the components of the preprocessing module by illustrating the stepwise improvement in the classification accuracies. As a result, the maximum test accuracy obtained by the proposed system is 82.24%, a significant improvement (~3%) over the previously reported best test accuracy of 79.34% in the base work [19]. Significant improvements in the average accuracies show that the system is reliably accurate. Higher specificity, sensitivity and f1-score indicate that the system is more effective for biomedical application than the system proposed in [19].
The proposed decision support system can assist human experts in the initial screening and post-diagnosis validation of EEG recordings. The system is being improved to eventually utilize it as an effective medical expert system that carries out brain health evaluation. In addition, a userfriendly clinical interface for the decision support system is being designed. A hybrid setup amalgamating the proposed system with explicitly derived features is also intended to be built to improve classification accuracy.

V. FUNDING
No funding was received by the authors for this work.

VI. DECLARATION OF COMPETING INTEREST
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.