Hybrid Approach of EEG Stress Level Classification Using K-Means Clustering and Support Vector Machine

Support vector machine (SVM) algorithms are prevalent in classifying electroencephalogram (EEG) signals for the detection of mental stress at various levels. This study aimed to reduce the subjective bias in form of human stress reactivity, by employing clustering methods to pre-label stress levels according to the inherent homogeneity and, perform SVM to classify the stress level. Brainwave signals at the prefrontal cortex (Fp1 and Fp2) from 50 participants were captured related to the stress induced by the virtual reality (VR) horror video and intelligence quotient (IQ) test. The power spectral density (PSD) values of Theta, Alpha, and Beta frequency bands were extracted, and Wilcoxon signed-rank test were reported to show a significant difference in the absolute power between resting baseline and post-stimuli. The extracted features were further clustered into three groups of stress level. The labelled data based on k-means clustering method were fed into SVM to classify the stress levels. The performance of SVM classifier was validated by 10-fold cross validation method and the result affirmed the highest performance of 98% accuracy by using only the feature of Beta-band absolute power at right (Fp2) prefrontal region on account of the significant changes of Beta activity during pre- and post-stimuli. In essence, stress pattern has been found in the brain activity of Beta frequency band within right prefrontal cortex which has been shown to be significantly more active under stimuli. The hybrid approach of classification using k-means clustering and SVM has been proven to be an effective method in lieu of pre-labelling the stress level to reduce individual differences in stress response, and in turn to improve the reliability and detection rate of mental stress.


I. INTRODUCTION
Various psychological tests have been devised in research and clinical practice for the purpose to obtain statistically useful information and measure stress levels such as Stress Response Inventory [1], Holmes-Rahe Stress Inventory [2], Hamilton Rating Scale for Depression [3] and Perceived Stress Scale [4]. The assessments involve self-report or clinician-rated by using subjective perceptions and estimations to extract specific information on cognitive, emotional, or behavioral stress responses. However, these methods are subjective and not sensitive enough to capture subtle patterns of mental state. Subjective self-reported stress has The associate editor coordinating the review of this manuscript and approving it for publication was Ludovico Minati . been reported to be insufficiently reflected by respective physiological parameters of the stress measurement [5], [6].
As compared to self-assessment questionnaires, physiological variables such as cortisol level [7], skin conductivity [8], heart rate [9], blood pressure [10] and electroencephalogram (EEG) signal [11]- [14] served as an additional objective and straightforward ways to measure stress. The high temporal resolution of electroencephalography (EEG) constitutes a possibly practicable and feasible neuroimaging technique. The combination of EEG experimental designs and signal analysis methods allowed researchers to study the complex brain structure and analyse different kinds of human brain states [15] in various research contexts. EEG information are useful in medical diagnosis and design treatment modality or neurotherapy. EEG is often used to investigate patients with neurological disorders such as epilepsy [16] and dementia [17] and other on-going researches about patients' cognitive states and symptoms classification.
Each of the frequency band represents a state of the person. Delta brainwaves tend to be the highest in amplitude and the slowest waves. The waves occur during deep sleep. Theta is an unconscious state and occurs whenever a person is in drowsiness. Meanwhile, Alpha demonstrates a state of relaxation without any focus or concern. Following this, Beta waves are observed during the state of normal consciousness and active concentration. Subsequently, a frequency band higher than Beta is named Gamma (> 30 Hz) indicates certain brain diseases [20].
In the work of stress recognition based on EEG signal, stress was reported to be associated with a change in frontal asymmetry [21]. The direction of the asymmetry with either higher or lower right relative to left frontal brain activity was depended on individual's underlying factors such as personality, emotion and motivation [22]- [24]. Alpha asymmetry index was used in [25] to measure the stress levels between left and right hemispheres based on prefrontal area of the brain. The right-hemisphere dominance was revealed in the subjects with moderate and high level of stress. Another study showed the Beta-band activity was increased at the right frontal region after stress inducement [26]. A correlation analysis was conducted, and the results validated that the energy spectral density (ESD) value of Alpha right and Beta right had significant correlation with high stress which supported stress was associated with right brain hemisphere [27].
Alpha and Beta have been highlighted to be an important stress indicator. The Alpha power was decreased on prefrontal cortex under the stress condition [28], [29]. When the subjects exposed to stressor, the Alpha power was lower than the resting state which concluded a negative correlation between the Alpha power of individual's relaxation and the stress level [30]. In contrast, stress pattern was demonstrated by high levels of relative Beta power at anterior temporal side of human brain [31]. Work by another research team [32] had also observed that the ESD of Alpha decreased and ESD of Beta increased when the subjects exposed to the external stressor. EEG stress analysis showed an increase in Beta power and decrease in Alpha power [33], [34] on the regions of prefrontal cortex [35]. This lobe is correlated with stress since human and animal experiments indicate that exposure to stress induces effects on the processing of the prefrontal cortex and this is known as a stress-susceptible brain area [36].
Generally, Alpha and Beta frequency bands are broadly studied rhythm of the human brain responding to stress. Another essential point in another study, the power spectral density (PSD) value of Theta was found positively and significantly associated with stress condition [37]. The mean value of EEG was reported tends to increase from resting condition to stressful condition, especially the increase of Theta power in the frontal regions and Beta power in the occipital regions being statistically significant [38]. An increase of Theta was observed at frontal midline region during the stress condition compared to the pre-stimulus baseline and poses potential marker for intact prefrontal cortex function [39].
Even though there are several EEG related studies have been done to classify stress into different levels, yet the EEG features were classified according to the pre-marked stress levels, that is, the difficulties of stressors and/or selfperceived questionnaire. Al-Shargie et al. [40] utilized mental arithmetic task with three levels of difficulty to induce variations in the brain cortical activities and collected by EEG signals. The stress features induced by the three levels of difficulty were labelled accordingly. By comparing the three levels of stress elicited by mental arithmetic tasks, the study showed that the Alpha power has greatly decreased from the first level to the second level of stress. But the power increased again from the second level to the third level. This result has also been verified that cortical activation failed at task level-three. The questionnaire survey on task load showed that with the increase of task difficulty, especially in the third level, the engagement of participants decreased significantly [28]. On the other hand, Arsalan et al. [41] arranged the participants to prepare and present on an unknown topic and classified the perceived stress into three different levels using the score obtained from perceived stress scale (PSS) questionnaire.
Likewise, Nagar and Sethia [42] used the stress scores calculated from the PSS questionnaire to specify three target stress levels. In fact, veridical stress state is potentially inaccurate and limited by the factors like unwilling to appear fragile and also lacking conscious perception [43]. Consequently, the result based on the self-reported stress labelling and the labelling using the levels of task difficulty might be less convincing due to incapable of dealing with the difference between subjects. Inter-subject variability is apparent and indisputable because of the time-variant and subject-specific brain processes rely on the experimental setting, psychological and neurophysiological factors. In accordance with that, clustering method has been suggested to have effective quantification of subjects who share similar and identical EEG signal characteristics [44]. VOLUME 10, 2022 Clustering method was introduced in a study to cluster the inherent homogeneity of all subjects' stress response into subgroups through trained and tested various physiological features such as EEG, electrocardiography (ECG), electromyography (EMG), galvanic skin response (GSR) and saturation of peripheraloxygen (SpO2). The study found that a small number of clusters showed a good balance between within-cluster homogeneity and between-cluster heterogeneity [45]. To the best of our knowledge, cluster related method and results solely based on the EEG signals and stress remain limited in the literature [46], [47]. The EEG data in these studies were processed using discrete wavelet transformation (DWT) and k-means clustering, followed by calculating stress indices value of cognitive data and physical data for clustering and establishing low and high stress level.
The present study has utilized EEG signal processing technique with clustering method to develop a three-level stress classification model. Stress response was triggered through stimuli in laboratory settings and the features were extracted for investigation to determine the significant and related stress features. Subsequently, clustering was applied in order to overcome the inter-subject differences to divide and assign the features into three groups of stress levels. Inter-subject variability is apparent and indisputable because of the time-variant and subject-specific brain processes rely on the experimental setting, psychological and neurophysiological factors. In accordance with that, clustering has been suggested to have effective quantification of subjects who share similar and identical EEG signal characteristics [48]. The clustered data with known class labels were then split into training and testing sets to build a classification model using machine learning algorithm. These approaches have been designed and intended to create an EEG based threelevel stress classification.

II. MATERIALS AND METHODS
In the systematic framework for EEG-based stress level classification, it consists of five major phases, starting from the data collection, data processing, data clustering, model development and ending with model evaluation and validation. The process flow of the research design is first presented in Figure 1 to provide a clear picture of the approaches employed by this study.

A. DATA COLLECTION
A total of 50 undergraduates and postgraduates from Universiti Teknologi Malaysia, Kuala Lumpur were recruited into the experiment. The group was comprised of 32 males and 18 females with aged ranging from 19 to 38 years old. All EEG recordings were acquired by using 5 electrodes (Ag/AgCl material) with conductive gel were used to attach on the surface of forehead and connect with biosignal acquisition software (g.MOBIlab+) to transmit EEG signals to PC via Bluetooth. The measurement locations of the five electrodes were fixed at the points chosen based on the international standard 10-20 electrode placement system. Fp1, Fp2 and Fpz (ground) were denoted at prefrontal cortex of brain region (forehead area). Fp1 was used for the left side of the forehead and Fp2 for the right side of the forehead, connected to Channel 1 and Channel 2 respectively. Meanwhile, both A1 and A2 were attached to the earlobes for reference points. The impedance of EEG electrodes was measured below 5 k where the EEG signals were sampled at sampling rate of 256 Hz and stored for offline analysis. The duration for whole experiment procedure was approximately 1 hour.
The first session was the EEG recording on eyes-closed at resting state condition for 3 minutes as the baseline measurement. This resting baseline session was to determine the difference in EEG changes between relax and the subsequent EEG recording of stressful conditions. Next, the participants had to wear the VR device to experience the 360-degree horror video for 3 minutes 30 seconds during the second session of experiment. The EEG signals recording for another 3 minutes at eyes-closed resting condition was performed immediately after the VR video session ended as post-VR video. The EEG power changes in between pre-VR video (eyes-closed resting baseline condition) and post-VR video was evaluated. Followed by the 20 minutes of IQ test where the participants were only given 20 minutes to complete the IQ test by answering the 40 questions of increasing difficulty on the website. The EEG signals recording for another 3 minutes at eyes-closed resting condition was performed after the IQ test session ended as post-IQ test. The EEG power changes in between pre-IQ test (eyes-closed resting baseline condition) and post-IQ test was evaluated.

B. DATA PROCESSING
The recorded EEG signals were transferred to the computer using Simulink models integrated with MATLAB. The raw EEG signals were stored as Matlab formatted data and followed by imported the EEG signals into the Brainstorm software, an application built in Matlab to process the data. 1 minute was selected from the 3 minutes EEG signal as sample and notch filter was applied to remove 50 Hz noise from power line prior to performing band-pass filter. Figure 2 depicts the EEG signals after the band-pass filter was applied to separate the whole frequency range of interest into four sub-bands namely Delta (0.5 -4 Hz), Theta (4 -8 Hz), Alpha (8 -13 Hz) and Beta (13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30).
Since human stress features can be captured by both time and frequency domain, transformation and decomposition methods that provide both time and frequency information have been widely considered specifically Wavelet transform. Several EEG studies have combined the statistical parameters from the time domain and wavelet-based features from the time-frequency domain to classify stress. High accuracy with result above 80% was reported [40] but there was model yielded an overall accuracy less than that [14]. A study [49] summarized that each feature extraction method has specific advantages and disadvantages depends on the signal wanted to be analysed, and thus the optimum method might be different for every application. In this study, the characteristic of signal in frequency domain is crucial in order to give better understanding on the effect of stimulus on brain signal. Frequency transform is required to describe the changes of spectral components information over stimulus instead of continuous prolonged EEG detection.
Fourier transform is a mathematical relationship and mapping formula between a signal in the time domain and its spectrum in the frequency domain. Discrete Fourier Transform (DFT), as the name suggests, it is the discrete version of the Fourier transform with reversible mapping operation for time series to calculate the spectrum of a finiteduration signal. The algorithm transforms a signal from time domain to the frequency domain components which assists signal analysis such as power spectrum analysis. Fast Fourier transform (FFT) is an implementation of the DFT where the algorithm efficiently reduces the computation time [50].
With the periodogram calculated using the FFT algorithm, power spectral density (PSD) can be determined.
The periodogram, however, suffers from large variance and low statistical precision. Welch presented an updated periodogram averaging approach that provides the effect of reducing the PSD variance [51]. The fundamental principle of the procedures includes the division of the time series data into segments, calculation of the modified periodograms and averaging of the modified periodograms [52]. In this study, the algorithm was used to yield PSD values for Delta, Theta, Alpha and Beta frequency bands. Time sequence of each frequency band was divided into 50% overlapping segments and the data within each segment were windowed. Fourier transform of each windowed segment was computed to get each periodogram. The PSD for each range of the frequency band was finally obtained by averaging the periodograms. The PSD according to Welch is demonstrated by the following equations [52]: D is the starting point for the i th sequence of input signal vector. The value of these segments D units apart in this study is D = L/2, i.e., the data segments contain 50% overlap between successive segments. The length of each segment is L and M denotes the number of overlapped segments. P i (f ) is the modified periodogram of the data due to the sequence of x i (n) are weighted by a nonrectangular window w(n). U is the normalization factor for the power in the window function, the periodogram of each windowed segment is computed by using the following formula, U denotes the mean power of the window w(n) and so, LU denotes the energy of the window function w(n) with length L. Based on the modified periodogram of each segment, the Welch's spectral estimate or PSD of each frequency band can be estimated by averaging M modified periodogram, which represented as below, There were total 8 PSD values were extracted from the four frequency bands for channel Fp1 and Fp2 per participant and per EEG recording session. Features in the four frequency bands are particularly important to characterize different brain states. Delta-band activity is prominent in early developmental stages and mostly observed during sleep, thus it was excluded for further analysis. The parameters further derived from the PSD values were the mean absolute power of Theta, Alpha and Beta, which can be calculated as the average of all the PSD values within its frequency range across all the subjects. The recorded EEG signals of 50 subjects at resting VOLUME 10, 2022 baseline, post-VR and post-IQ were averaged and evaluated respectively. EEG absolute power obtained from the Welch's FFT was tested for normality using SPSS the statistical software and the statistical findings were confirmed as non-normally distributed data, and thus non-parametric analysis method was used for further analysis [53]. Wilcoxon signed-rank test was conducted to compare two related conditions specifically EEG power changes between pre-and post-VR and pre-and post-IQ. Z-score was calculated to describe the deviation from the mean in units of standard deviation. The equation is shown below, where Z is the z-score, X is the value of the element, µ is the mean of the population, and σ is the standard deviation.
In this study, p-value which is found in the region of two-tails was served as a scalar for the purpose of feature selection. The differences in EEG responses of pre-and poststimuli were considered statistically significant if p-value was less than 0.05. Electrode channels for which the null hypothesis was rejected (p < 0.05) were kept. The statistically significant difference implied the two groups were derived from different stress level.

C. DATA CLUSTERING
The selected statistically significant features were imported into Weka, a machine learning software to apply k-means clustering algorithm. K-means is used as the clustering process to group various objects based on their attributes in k number of groups. In order to cluster a given dataset, firstly specify k, which is the number of clusters to be generated. K points are chosen randomly from existing data as cluster centres and each instance is calculated and assigned to its closest cluster centre using Euclidean distance metric. Each instance is grouped among clusters based on minimum Euclidean distances. Next, the centroid or the mean for each cluster is calculated and used as a new cluster centre. Following by the reassignment of all instances to the closet cluster centre. The process iterates till the algorithm converges or the cluster centres do not alter anymore. The objective function is described as follow [54]: The algorithm aims at minimizing J function which is known as squared error function. K-means applies an iterative refinement method to produce its final clustering based on the dataset and the number of clusters defined by the user which is represented as variable k. The X (j) i − C j 2 is a chosen distance measure or so-called Euclidean distance between X (j) i and the cluster centre C j . This is an indicator of the distance of the n data points from their respective cluster centres. In this study, the number of clusters was selected as 3 which were the low, moderate and high level of stress.
Each cluster was associated with a centroid and every feature point was allocated to the nearest centroid.

D. MODEL DEVELOPMENT
Subsequently, the clustering models were then fed into Support Vector Machine (SVM) algorithm to classify the stress level. The fundamental concept of SVM classifier lies in the formation of an optimum hyperplane that can recognize and separate the two different classes based on the implementation of features extracted. Firstly, a key parameter in SVM is to choose the type of kernel function to use. The choice of kernel function defines the feature space and mapping characteristic which are critical to non-linear classification and regression in SVM. For instance, the implementation of SVM with radial basis functions (RBF) and polynomial kernels served as part of the optimization process due to its ability to automatically determine the number of centres, their positions, and weights. Several studies reported the performance of the SVM with polynomial kernel function achieved the best classification accuracy and was better than the average performance of the SVM with RBF kernel function [55]- [57]. In this study, Polynomial kernel was chosen, and its equation is shown below [58]: The parameters within the kernel function must be tuned and optimized in order to produce the best result of performance. Here, d parameter in the above (4) represents the degree of polynomial kernel controls the flexibility of the classifier. d = 1 is the lowest degree will correspond or retrograde linear kernel, which is not an ideal selection for non-linear feature. d = 2 yields enough of the flexible decision boundary to differentiate between the two classes with hyperplane [59]. Polynomial kernel with d = 3 was reported to have the lowest classification error [56] and improved performance [60]() yet with no doubt relatively longer computation time for finding the optimal values of the parameters in the kernel function.

E. MODEL EVALUATION AND VALIDATION
A specific optimization procedure was used by using the concept of cross validation, the appropriate values for the parameters were calculated during model training. 10-fold cross validation method was selected to tune and find the best parameters for the polynomial kernel to generate a model and to better evaluate the performance of a model. The dataset is first divided into 10 distinct subsets. Sequentially one subset is tested using the classifier trained on the remaining (10 -1) subsets. The process iterates until each subset is given a chance to be the test set once. The full behaviour of the crossvalidation result or the performance of a classification model was evaluated by exploiting the confusion matrix. Confusion matrix a N-by-N dimensional matrix, where N is the number of target classes.
A confusion matrix contains and reports information about the counts of true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN) [61]. TP and TN both indicate true and correct predictions. TP means the predicted positive class matches the actual positive class, whereas TN refers to the predicted negative class matches the actual negative class. FP and FN mean the predicted class is falsely predicted. In details, FP also known as the type 1 error means the actual class was negative, but the model predicted a positive class. Meanwhile in contrast, FN the type 2 error refers to the actual class was positive, but the model predicted a negative class. With the application of these calculated TP, TN, FP and FN, a wealth of related performance metrics and classification statistics can be extracted for each class separately, namely accuracy, TP rate (TPR), FP rate (FPR), precision or positive predictive value (PPV), F-measure, Matthews correlation coefficient (MCC), receiver operating characteristics (ROC) and precision-recall curve (PRC) [61].
Accuracy represents the overall percentage of correctly classified instances compared with the total number of instances. TPR also known as recall or sensitivity is the proportion of the positive instances that are correctly classified out of the total number of actual positive instances in a dataset. FPR is the ratio between negative instances which wrongly categorized as positive and the total number of actual negative instances. Precision measures the performance of positive predictions which is the fraction of instances classified as positive that are truly positive. F-measure represents a combined performance of both precision and recall, namely their harmonic mean. MCC is a balanced measure of correlation and dependence between the actuals and predictions. In addition, ROC is a graphical approach that plots a pair of statistics which are TPR and FPR for analyzing the performance of a classifier. Whereas, PRC is a graph plotting the relationship between precision and recall, with recall on the x-axis and precision on the y-axis.

A. STATISTICAL ANALYSIS RESULT
The involvement of the prefrontal region in between pre-and post-stimulus was observed through the changes in absolute power of Fp1 and Fp2. Fp1 and Fp2 were statistically analysed accordingly to sort out the relevant features and choose the optimal subset of features from the feature sets which may provide maximum classification accuracy. The PSD features particularly absolute power were compared between the pre-and post-stimuli conditions by means of the nonparametric Wilcoxon signed-rank test. Feature selection was based on the significant results produced by Wilcoxon signed-rank test. Features for which were significant at p < 0.05 were selected. The Wilcoxon signed-rank test was employed as a filter approach by ranking and assessing the z-values and p-values of features extracted from VR horror video and IQ test. The overall statistical analysis is shown in Table 1. Table 1 shows the mean values of the EEG absolute power changes of two channels for all participants  for post-stimuli. The difference of Fp1 and Fp2 at Theta, Alpha and Beta power in pre-and post-stimuli recording stage was compared with the final Wilcoxon analysis. From the overall statistical analysis on all the EEG electrodes summarized in the table, the measured p-values of Fp1 and Fp2 were significant for Theta frequency band at post-IQ. While for the Beta frequency band, the p-value of Fp2 was significant at both post-stimuli. The overall result demonstrated that Theta and Beta power responded more significantly to stress than the Alpha power. Specifically, the electrodes from the right prefrontal region, Fp2 in Theta and Beta powers were highly sensitive to stress as reported by their p-values.

B. K-MEANS CLUSTERING
The significant features extracted from the above step were imported into k-means clustering method to divide the subjects into different categories. In this study, the number of clusters was selected as 3 which were the low, moderate and high level of stress. The cluster was associated with a centroid and every feature point was allocated to the nearest centroid. Tables below indicate the clustering assignment for the selected features. Table 2 displays the clustering assignment of absolute power of Theta band (Fp1) where the 11 subjects, 27 subjects and 12 subjects were clustered into the low stress, moderate stress and high stress respectively. Table 3 indicates the 34 subjects, 12 subjects and 5 subjects were clustered into the low, moderate and high stress respectively using Thetaband absolute power (Fp2). Based on the Beta absolute power (Fp2) in Table 4, there were as much as 17 subjects had low stress, 25 subjects were moderately stressed, and 8 subjects were highly stressed.
Following by the feature combination of Theta-band absolute power (Fp1 and Fp2). Table 5 shows the result of this feature set where the 27 subjects, 15 subjects and 8 subjects     were clustered into the low stress, moderate stress and high stress respectively. Table 6 shows the 14 subjects, 25 subjects and 11 subjects were clustered into the low, moderate and high stress respectively using Theta (Fp1) and Beta absolute power (Fp2). Table 7 shows the clustering assignment of absolute power of Theta band (Fp2) and Beta band (Fp2) where the 15 subjects, 23 subjects and 12 subjects were  clustered into the low stress, moderate stress and high stress respectively.
Last but not least, Table 8 shows the combination of all the features mentioned above which were Fp1 and Fp2 of Thetaband absolute power and Fp2 of Beta-band absolute power. 17 subjects had low stress, 24 subjects had moderate stress and 9 subjects were highly stressed.

C. SVM CLASSIFICATION
This section discusses the SVM classifier with polynomial kernel function (degree = 3) and the 10-tests accuracies. In Table 9, the results of the classification accuracy in the detection of stress level from different feature sets are shown below.
According to the table, the feature of Beta-band absolute power (Fp2) alone has achieved the best accuracy of 98% which has made up the best classification model. Second, the feature sets with absolute power of Theta band (Fp2) and Beta band (Fp2) have achieved the high accuracy of 94%. However, the feature of Theta absolute power (Fp1) alone has produced the lowest accuracy at 66%.
There is no available benchmarking literature on the hybrid unsupervised and supervised approach to classify the EEG signals into three levels of stress. Therefore, the classification accuracy of three-level stress classification obtain from this research are compare with previous similar research as a benchmarking [11], [40]- [42]. This research compared a result between previous studies on SVM and current research by implementing the combination of k-means clustering and SVM in order to find the best accuracy for three-class stress classification as an evaluation factor.
Al-Shargie et al. [40] employed WT and SVM with Error-Correcting Output Code (ECOC) to build a classification model based on Alpha PSD which had achieved 94.79% for the average classification performance at three levels of stress. Arsalan et al. [41] exploited Welch's FFT to extract various features including PSD and classify the features from Theta frequency band into two-level and three-level of stress using SVM, NB and MLP classifier. They found that the highest accuracy was 92.85% when MLP was used to classify two-level stress, and 64.28% of accuracy was produced when performing three-level stress classification. Whereas SVM classifier achieved 46.42% for the three levels of stress classification.
Jun and Smitha [11] applied FFT to obtain PSD specifically the ratio of the relative difference of Beta power and Alpha power as feature and fed into SVM classifier with 4-fold cross validation to achieve 75% of classification accuracy at three-level stress. Nagar and Sethia [42] extracted PSD ratios of four frequency bands, namely Delta, Theta, Alpha and Beta in their study and classify stress levels into two and three classes using SVM and KNN with 10-fold cross validation. The accuracy of KNN's three-level stress classification was 74.43%, yet SVM was not used for threelevel stress classification in that study due to low accuracy of two-class using SVM which was 52.3%. In a nutshell, the performance of this proposed system is improved from existing approaches, by providing an accuracy of 98% using Beta power from right prefrontal cortex.

D. 10-FOLD CROSS VALIDATION
Confusion matrix summarizes the prediction results and the performance of three-level stress SVM classification using absolute power of Beta band (Fp2). As shown in the Table 10, 17 actual instances from low stress class were correctly classified as low level of stress condition. This corresponded to 34% of all the 50 instances and the percentage of correct classification of the particular class was 100%. As for the moderate level of stress, 25 instances which took up to 50% were correctly classified and the percentage of correct classification of the particular class was also 100%. Out of 8 highly stressed predictions, 7 instances which corresponded to 14% were correctly classified and only the 2% which was 1 actual instance from high stress class was misclassified as moderate stress. Based on the confusion matrix, the overall result of the three-level stress classification yielded 98% of correct predictions and 2% of misclassifications. Table 11 reports the summary of SVM model output evaluated by the 10-fold cross-validation. All the basic evaluation measures of classification were derived from the confusion matrix. TPR or equivalently the recall, reported the rate of true positives which were the instances correctly classified as a given class. The highest value can be found at low and moderate stress classes followed by high stress class. Furthermore, FPR reported the rate of false positives which were the instances falsely classified as a given class. The value in the class of moderate stress represented the misclassified instances from the high stress class. The instance which should be correctly classified as high stress was incorrectly classified into moderate stress as false positive. The value 0 in low and high stress classes indicated no instances from other class were predicted in their class.
Apart from that, FPR affected the precision of each class in a sense. The value 1 in high stress class indicated high precision because its positive instances were truly positive and no negative instances from other classes were found. However, the precision of low stress and moderate stress classes were lower due to other negative instances found in their classes. Next, a combined measure for precision and recall was calculated as F-score which aimed to captures both properties and balance both the concerns of precision and recall. Similar to precision and recall, a poor F-measure score is 0 and a perfect F-measure score is 1. A perfect Fmeasure score was found in the low stress class with its perfect precision and recall score. Moreover, the MCC described the correlation and statistical rate between the actuals and predictions. High score is produced if the prediction obtained good results in all the cells of the confusion matrix which are TP, FP, TN and FP. The result determines the quality of this multiclass classifier prediction in a confusion matrix context. A coefficient of 1 in the class of low stress represented a perfect prediction and followed by moderate and high stress class. For the ROC area, obviously, the most perfect area was found with low stress class as its FPR at value 0 and TPR at value 1 and consequently its ROC area at value 1. Concisely, these three classes with higher TPR and lower FPR have predicted the positive instances almost perfectly indicating better classification performance. The PRC area indicating the closer to value 1 as the higher of precision and recall. The class of low stress has produced the highest PRC area value which outperformed the other classes. Overall, these three classes with lower false positive and negative rates implying more instances were labelled correctly by the SVM classifier.

IV. CONCLUSION
The stress response and elucidation on the participants were assessed by measuring the Theta, Alpha and Beta absolute power. Theta and Beta absolute power showed significant increase in both stress conditions. Wilcoxon signed-rank test reported the p-value to discover the significant features. In details, the Theta at Fp1 (p < 0.001) and Fp2 (p < 0.015) electrodes manifested significant difference at post-IQ while Beta at Fp2 electrode highlighted significant difference at post-VR (p < 0.024) and post-IQ (p < 0.011). The abovementioned significant features were proceeded with k-means clustering to group the inherent homogeneity of participants' stress response. Surprisingly, SVM with polynomial kernel managed to classify the data into the corresponding stress levels which were the low, moderate and high level of stress state by using only the feature of Beta absolute power (Fp2) and produced the highest accuracy at 98% compared to other feature sets. Its performance was assessed in terms of TP rate, FP rate, precision, recall, F-measure, MCC, ROC and PRC area. The entire experimental results revealed that classification using hybrid approach the k-means clustering and SVM reduced individual difference in stress response caused by the multivariate relationship between stress and human physiology and eventually confirmed the incorporated clustering method can improve the mental stress detection. The accuracy of threelevel stress classification in this study has been improved as compared to methods without clustering [11], [40]- [42]. Though the high scores look promising, the findings need to be verified in future studies. The statistical analysis also needs to be confirmed for multiple comparisons in future work.
It is a good start for deep exploration in using minimal EEG channels for the development of real-time stress classification since Fp2 electrode the right prefrontal region was confirmed to be highly sensitive to stress. Besides, the comparison of self-reporting stress label and cluster-based stress label can be done in future since the cluster analysis in this study has been aimed at removing bias of subjective labelling. Based on the current study, there is limitation of using k-means clustering prior to SVM classification. Since k-means algorithm is about finding mean of clusters, the algorithm and its centroids can be dragged and influenced by outliers and noisy data. Hence, consider adding outlier detection to clustering algorithm and process to identify and remove outliers [62]. Apart from that, it is recommended that future studies explore clustering algorithm with outlier detection method on larger datasets to increase the quality and adaptability of outlier detection for more comprehensive and higher reliability result. A larger sample size allows more robust segregation of different stress levels and examine the difference in gender correlate to stress levels. Besides that, supposedly different groups of individuals should be included and tested such as young and elderly in order to achieve a better understanding and model stress in real life.