Autonomous System for EEG-Based Multiple Abnormal Mental States Classification Using Hybrid Deep Neural Networks Under Flight Environment

Detection of the pilots’ mental states is particularly critical because their abnormal mental states (AbSs) could cause catastrophic accidents. In this study, we presented the feasibility of classifying the various specific AbSs (namely, low fatigue, high fatigue, low workload, high workload, low distraction, and high distraction) by applying the deep learning method. To the best of our knowledge, this study is the first attempt to classify multiple AbSs of pilots. We proposed the hybrid deep neural networks with five convolutional blocks and two long short-term memory layers for decoding multiple AbSs. We designed the model to extract the informative features from electroencephalography signals. A total of ten pilots conducted the experiment in a simulated flight environment. Compared with five conventional models, our proposed model achieved the highest grand-average accuracy of 68.04(± 5.26)% which is at least 6.55% higher than other conventional models for classifying seven mental states across all subjects. Our proposed model could distinguish and classify low and high levels for each status category and give appropriate feedback to the subjects. In addition, we found nine indicators that showed the statistically significant differences between two mental states ( ${p} < 0.05$ ). Hence, we believe that it will contribute significantly to autonomous driving or autopilot advances based on artificial intelligence technology in the future.

In this study, we focused on the AbS under a pilot's environment.Many studies have shown that the pilot's mental capabilities could affect flight safety because the control of aircraft is a challenging task and consumes a substantial amount of energy [23], [24], [25].Indeed, more than 70% of aviation accidents are caused by human error [26].Fatigue is commonly caused by a prolonged cognitive task, especially a repetitive or boring task [24].According to the British Airline Pilots' Association, 280 of 500 commercial pilots said that they fell asleep during the night.Also, workload is the required mental cost to achieve the given task [15].Task difficulty and workload of pilots could be influenced by various interruptions during the flight.The division of the National Aeronautics and Space Administration for Human Factors reported research indicating that as the pilot's workload increases, the ability to deal with flight-based information is decreased [27].In the case of distraction, it occurs when pilots do not concentrate on the flight task.Civil Aviation Authority of New Zealand reported that distraction during flight is the major reason for errors leading to accidents [28].
Recent studies have investigated how to detect the AbS using various physiological signals, such as electroencephalogram (EEG), electrooculogram (EOG), electrocardiogram (ECG), respiration, and electrodermal activity.Hong et al. [29] decoded the drowsy state utilizing machine learning methods from EEG, ECG, and photoplethysmography signals.They achieved the highest kappa value of 0.98 for detecting the drowsy state by using the random forest as a classifier.Horrey et al. [30] recognized distraction using near-infrared spectroscopy, heart monitoring, and eye-tracking systems.Drivers showed less variability in lane-keeping and headway maintenance for both auditory conditions, but response times to critical braking events were longer in the interesting audio condition.However, wearing multiple devices for measuring various physiological signals causes difficulties in applying in a real-world environment.
Among various physiological signals, EEG signals are considered promising signals in the case of classifying the mental states owing to reflecting the actual condition of the human mental states directly [31].Due to this point, a few research groups have studied detecting the AbS using only EEG signals.Wu et al. [32] proposed a nonparametric prior induced deep sum-logarithmic-multinomial mixture model for detecting the pilot's fatigue through the developed brain power map and an adaptive topic-layer stochastic gradient Riemann Markov chain Monte Carlo inference method for learning its global parameters without heuristic assumption Lee and An [33] designed a deep neural network for EEG-based drowsiness detection in multiple consciousness states (awake, sleep, and drowsiness).Each output of the convolutional neural network (CNN) and long short-term memory (LSTM) network were concatenated.They achieved the accuracy of 85.6% for classifying three consciousness states.Yang et al. [34] proposed the novel complex network-based broad learning system for realizing EEG-based fatigue detection.They demonstrated that their method could accurately differentiate the fatigue state from an alert state with high stability.Reddy et al. [35] proposed the spatio-spectral optimized fuzzy-independent phase-locking value representation in EEG signals for monitoring user's cognitive states.They analyzed car drivers' EEG synchronization changes as users drift between alert and drowsy states.Zhang et al. [36] designed a concatenated structure of deep recurrent and Three-dimensional CNNs (R3DCNNs) to learn EEG features across different tasks for recognizing workload.The results showed that the R3DCNNs achieved an average accuracy of 88.9%.Zheng et al. [37] proposed an individual-independent workload estimator, a cascade ensemble of multilayer autoencoders to tackle the individual difference within the EEG features.Their proposed method showed acceptable accuracy and computational complexity compared to several conventional workload classifiers.Beltrán et al. [38] designed an automatic framework for detecting distraction using BCI under a realistic driving simulator.They achieved F1-score of 0.839 with a binary model and 0.730 with a multiclass model using EEG signals, improving 0.07 and 0.08 in binary and multiclass classification, respectively.Sonnleitner et al. [20] recognized distraction from EEG signals.They showed that reaction times and α spindle rate increased with time-on-task and the grand-average classification error across all subjects was 8%.The reaction time is one of the important variables to consider when detecting AbS.However, there are many variables, such as the state of decreasing concentration or spacing out that can affect the reaction time.
Thus, the AbS detection based on EEG signals has a variety of strengths: 1) consideration of the spectral, spatial, and temporal information of signals; 2) usage a shorter length of signals for including the significant information compared to other biological signals; and 3) a direct reflection of the current status of users.However, due to the nonstationary characteristic of EEG signals [41], [42], recognition of each particular AbS has been achieved through binary classification (e.g., fatigue or not, workload or not, and distraction or not).In this study, we focused on developing a novel framework that allows the detection of multiple specific AbS.Hence, we could classify the AbS as high performances.
The main contributions of this study are as follows.

A. Subject
A total of ten subjects [S1-S10, nine males and one female, aged 25.6(±0.52)]participated in our experiment, all of whom had over 100 h of flight experience in the Taean Flight Education Center as an affiliated educational organization of Hanseo University.In the BCI domain, a difference between the brain signals of males and females exists [43].Therefore, it is important to maintain the balance of the subjects' gender.The data provided by the Ministry of Land, Infrastructure, and Transport of the Republic of Korea shows that only 352 of 8734 people (4%) who obtained aircraft pilot licenses in the Republic of Korea from 2012 to 2016 were females.Hence, we had difficulty recruiting female subjects.All subjects had no history of psychiatric or neurological disorders and they were naïve BCI users.The subjects were informed of the entire experimental protocols, and after that, they consented according to the Declaration of Helsinki.Experimental protocols and environment were reviewed and approved by the Institutional Review Board of Korea University [1040548-KU-IRB-18-92-A-2].After finishing each experiment, the subjects were asked to complete a questionnaire for checking their physical and mental conditions with the purpose of evaluating the experimental paradigms.

B. Experimental Environment
We used the Cessna 172 model (Garmin, Olathe, KS) in the flight simulator, as shown in Fig. 1.The simulator included the cockpit, the screen, a keypad, and the signal amplifier.The cockpit consisted of a monitor display, a flight yoke, and other control panels to build a realistic flight environment.A keypad was used for inputting the number.It was attached to the flight yoke directly for minimizing external noise caused by movement when the subjects tried to press the keypad.We used the signal amplifier (BrainAmp, Brain Products GmbH, Germany) to measure the EEG and EOG signals.The sampling frequency of EEG and EOG signals was 1000 HZ, and a 60 Hz notch filter was applied for removing the power supply noise.30 EEG channels were used on the scalp according to the international 10/20 system.In addition, four EOG channels were also attached to the vertical and horizontal lines around the eye.The reference and ground electrodes were placed on the FCz and AFz channels, respectively.The impedance of electrodes was set up below 10 k by injecting the conductive gel before the experiment.

C. Experimental Protocol and Paradigm
The experimental paradigms were designed to effectively induce the pilot's fatigue, workload, and distraction with EEG signals, as shown in Fig. 2. We designed the experimental paradigms delicately and conducted experiments in a strict environment to acquire EEG signals of fatigue, workload, and distraction independently.The entire experiment we designed was conducted in the order of the distraction, workload, and fatigue experiments.We provided a 20 min break between each experiment to allow the subjects to rest sufficiently and to remove the feeling of the task given in the previous experiment.After finishing each experiment, we conducted a questionnaire to confirm whether the effects of the previous experiment remained and to check the physical and mental conditions of the subjects.Even though a 20 min break was provided, if the subjects checked that the effects of the previous experiment remained or their physical and mental condition was not proper to participate in the experiment, we provided an additional break.In addition, the tasks included in the three experiments consisted of monotonous tasks that were not difficult for the pilot.The EEG signals may be contaminated by various factors in the case of complex tasks.We did not provide the subjects with extra time to practice the tasks, since the correct AbS may not occur once the subjects become accustomed to the task given in each experiment.
1) Fatigue: The subjects performed a monotonous flight during nighttime, and the initial setting values were 3000 feet (height), 0 • (heading), and 100 knots (velocity) for 1 h to induce drowsiness.The beep sound appeared every minute to notice the timing which is related to pressing one number using the keypad, as shown in Fig. 2(a).The beep sound we provided to the subjects every minute was a quiet and soft sound of 40dB, not a sharp sound.The sound of 48 dB or less does not induce an awake state [44].Therefore, the 40-dB beep sound we provided did not interfere with the induction of drowsiness.The subjects fed the Karolinska sleepiness scale (KSS) as input which is estimated as the significant index of subjective drowsiness level.KSS consists of nine levels.In addition, the number of each KSS level is different for each subject since the subjects subjectively evaluate their drowsiness level.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
7) Level 7: Sleepy, but no difficulty remaining alert.8) Level 8: Sleepy, some effort to keep alert.9) Level 9: Extremely sleepy.If the subjects did not input the KSS or could not operate an aircraft successfully, that period was regarded as level 9.We defined levels 1-3 as NS, levels 4-6 as LF, and levels 7-9 as HF.Because the fatigue experiment is the last of the three experiments, the accumulated various tasks and experimental time could be factors that sufficiently induce fatigue in the subjects.In addition, after the fatigue experiment, we conducted a questionnaire to ask about the conditions of the subjects, and all subjects mentioned that they felt fatigued without any disturbance.
2) Workload: The predefined conditions of aircraft in the workload-inducing experiment are 3000 feet, 0 • , and 100 knots.In other words, the flight is already in the air.Various flight instructions were given to the subjects to induce the change in the pilot's workload by the instructor who sat at the next seat to the subjects.The instructions consisted of three levels based on the complexity of the given task as follows.
1) Level 1: Flight with the maintenance of a given altitude and disregard for heading and velocity.2) Level 2: Flight with the maintenance of continuously given altitude, heading, and velocity.3) Level 3: Steep turn with bank angle and the maintenance of velocity toward the given heading.There were 15 trials for level 1, and the length of the task execution was 60 s.There were ten trials for level 2, with a task execution length of 60 s.There were ten trials for level 3, with a task execution length of 60 s.Moreover, the length of the instruction was 10, 20, and 30 s for each level, respectively.The length of rest after task execution was 10 s regardless of the level, as shown in Fig. 2(b).The pass or fail was marked by the instructor.We defined level 1 as LW and levels 2 and 3 as HW.
3) Distraction: While the subjects performed the simulated flight consisting of the predefined conditions of aircraft (3000 feet, 0 • , and 100 knots), a part of prerecorded sentences (length: 4-22 words) of the air traffic control (ATC) message were presented in a regular time interval.Counting the number of words in the ATC message while maintaining the predefined conditions of the aircraft can induce distraction to the subjects.The distraction level was divided into three levels according to the length of the ATC message as follows.
3) Level 3: 15-22 words.The subjects were instructed to count the number of words in the ATC message with no body movement.A beep sound appeared when it was time to enter the number of words using a keypad.This experiment consisted of two sessions, and after finishing 1st session, the subject rested for 3 min.The number of trials for each level and the length were 40 and 10 s, respectively.In addition, another rest interval of 4 s was allowed after the response, as shown in Fig. 2(c).We defined level 1 as LD and levels 2 and 3 as HD.Since the task of counting the number of words is not normally performed when pilots aviate the aircraft, most pilots will not be familiar with conducting the number of words included in auditory stimuli.Also, since distraction is effectively induced when conducting unfamiliar tasks, we instructed the subjects to count the number of words contained in the auditory stimuli.

D. Signal Preprocessing
Preprocessing of EEG signals was conducted using a BBCI toolbox [45] in MATLAB 2019a.EEG signals were band-pass filtered between 1 and 50 Hz using a 2nd order zero-phase Butterworth filter and were downsampled from 1000 to 100 Hz.To obtain the high quality of EEG signals, an independent component analysis (ICA) [46] was applied to remove the contaminated components generated by unnecessary pilot's body movement, such as eye blinks and movement of the head.EOG channels were used for checking the eye-related artifacts and removed these artifacts using ICA.Each trial for the acquired data from three paradigms was segmented into 1 s data without overlap [31].Therefore, 3600 samples (60 samples×60 trials) were obtained for each subject in the fatigue experiment.1st and 2nd sessions] 180 samples) were obtained for each subject.Across all subjects, 36 000 samples, 24 500 samples, and 18 600 samples were obtained from each mental state, respectively.The problem of data unbalance exists due to the difference in the total number of samples acquired in each mental state, so we analyzed according to the data with the smallest total number of samples.

E. MentalNet
We proposed the MentalNet to accurately detect and classify mental conditions using EEG signals, as shown in Fig. 3.In the conventional models, various EEG features (such as spectral, spatial, and temporal information) have been used to train the deep learning models.Hence, we used the principle of the hybrid deep learning framework.We constructed deep convolutional blocks to extract various spatiotemporal features and applied the LSTM network at the end of the convolutional block to consider the characteristics of time-series data.Specification of the proposed MentalNet is described in Table I more in detail.
The spatiotemporal CNN in our proposed MentalNet was designed in the form of a deep CNN framework, which was composed of five convolutional blocks to extract various highlevel features.Convolutional blocks I, II, and III have two convolutional layers, 1×5 filters for extracting the temporal features, 1×1 stride, and a batch normalization layer with the number of feature maps of 32, 64, and 128, respectively.In addition, convolutional blocks IV and V have three convolutional layers, 5×1 and 3×1 filters for considering the spatial Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.information, respectively, 1×1 stride, and a batch normalization layer with a batch size of 128 and 256, respectively.Also, the maximum-pooling and average-pooling layers are applied, respectively, to avoid the overfitting problem.The exponential linear unit (ELU) was used as an activation function in convolutional block V as below [47] ELU The output data filtered through five convolutional blocks was assigned as input data of the LSTM block.We used the LSTM network which is one kind of recurrent neural network to solve the long-term dependency problem in time-series data.The LSTM network has been widely used in physiological signals-based studies related to mental states [31], [48].The layer processes sequential instances of input data from the first-time instance to the end-time instance.Each LSTM memory cell is processed using input data X = (x 1 , x 2 , x 3 , . . ., x n ) for n time steps as below.The LSTM block of our proposed model consisted of two LSTM layers with 256 and 128 hidden units, respectively As above equations [49], at time t, using input x t and the previous hidden state h t−1 , the memory cell selects what to keep or forget from the previous states using the forget gate f t (2).The memory cell computes the current state c t in two steps.First, the cell calculates a memory cell candidate state ct (5).Next, using the previous cell state c t−1 and input gate i t (3), the cell decides how much information to write in the current state c t (6).The output gate decides how much information h t (7) will be transferred into the next cell using the output gate o t (4).W denotes weight matrices or weight vectors, b denotes biases, σ is the logistic function, and • is the Hadamard product operator.The classification block, which is the last part of our model, consisted of three fully connected layers and a softmax layer.The hidden units of the first and second fully connected layers Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Store feature maps extracted using eq.( 1) in the last convolutional block V

5
The loss value is generated by using eq.( 2)-eq.( 7) in LSTM Store feature maps extracted using eq.( 1) in the last convolutional block V

A. Performance Evaluation
We evaluated the performance using various performance metrics, such as accuracy, precision, recall, and F1-score.We applied the fourfold cross-validation method to evaluate classification accuracy fairly.The dataset of each mental state was randomly shuffled and divided into four parts.The three parts and the one part were used as the training set and the test set, respectively [53].Table II presented the classification results of classifying various mental states according to each subject.The grand-average classification accuracy was 68.04% across seven different mental states.S6 showed the highest classification accuracy of 78.94% and S10 recorded the lowest performance of 61.23%.We confirmed the significantly higher classification performance compared to chance level accuracy, and we also found out that our proposed model applied effectively to all subjects, given that the performance variation between subjects was recorded at 5.26 (Std.).Based on the F1-score, the average value of the high condition was higher for each state than for the low condition (e.g., HF versus LF).We could confirm that the proposed MentalNet has extracted the high-level feature of the brain dynamics under the high condition better than the low condition.
In addition, Table III showed the comparison of the classification performances with the statistical analysis among the conventional models and the proposed model for recognizing various mental states.For fair and diverse performance evaluations, we conducted the performance evaluations by clustering representatively used mental state classes into groups.Initially, we defined the three different evaluation groups: Group1: NS and AbS (2-class), Group2: LD, HD, LF, HF, LW, and HW (6-class), and Group3: NS, LD, HD, LF, HF, LW, and HW (7-class).
We compared the accuracies of the MentalNet with the conventional models.The conventional models used for performance comparison were the power spectral densitysupport vector machine (PSD-SVM) [50], CNN-I, CNN-II, CNN-III, and CNN-IV.The PSD-SVM extracted the PSD of the δ-(1-4Hz), θ -(4-8Hz), α-(8-13Hz), and β-bands ) but neglected the spatial information of EEG signals as features, and the SVM was used for the classification.The CNN-I is the DeepConvNet [51].It has four convolutional blocks, and each block consists of a convolutional layer, a batch normalization layer, and an ELU activation layer.The 1st block is for extracting both spatial and temporal features.In addition, the other blocks are used for extracting the temporal features only.We modified the spatial filter of the DeepConvNet more deeply (CNN-II) [51].We modified the DeepConvNet assuming that the spatial features will have the significant effect on data analysis.The CNN-III is the EEGNet [52].It uses depthwise separable convolution, so it has a small number of parameters.The components of EEGNet are two blocks using depth-wise convolution.Each block has two convolutional layers with the batch norm, dropout, and ELU as an activation function.The CNN-IV is the multiple feature block-based CNN (MFB-CNN) which has both deep spatial and temporal filters [48].We wondered how much the temporal features can affect model performance, so the MFB-CNN was selected as one of the comparative models.
The average accuracies of the MentalNet were the highest in all evaluation groups, compared to the performances of the conventional models, and the values were 91.35(±2.85)%,72.85(±4.44)%,and 68.04(±5.26)%,respectively.We computed the chance-level accuracies with the significant confidence level (α = 0.05) [54].When compared with the chance-level accuracy of each group, our performance was improved as 41.35% (2-class), 56.18% (6-class), and 53.75% (7-class).In the performance of our MentalNet, S6 showed the highest accuracies of 97.78% and 78.94% in Group1 and Group3, respectively.Also, S4 represented the highest accuracies of 80.12% in Group2.In contrast, S10 showed the lowest accuracies of 87.80%, 65.91%, and 61.23% in Group1-3, respectively.In the performance of the conventional models, the CNN-II represented the highest accuracy in Group1, and the CNN-IV showed the highest accuracies in Group2 and Group3.The low-performance variability showed that various mental state detection was also stable across all subjects.
To verify the difference in classification between the conventional methods and the proposed model, the paired t-test was applied with Bonferroni's correction.Initially, the normality and homoscedasticity were validated due to a small number of samples.The normality for all models in whole groups applying the Shapiro-Wilk test was satisfied with a null hypothesis (H0), and the assumption of homoscedasticity based on Levene's test was also met for all models in whole groups.Our proposed MentalNet had the statistically significant difference in performance among all conventional methods with p-value below 0.05.

B. Neurophysiological Analysis From EEG Signals
In previous our study [48], we analyzed AbS in detail by dividing the frequency into four parts (δ-, θ -, α-, and βbands) and the brain region into three parts (temporal, central, and parietal regions).In the temporal region, the activation degree of fatigue was higher in the δ-, θ -, and α-bands compared with that of other mental states.In the central region, the notable highest activation degree was observed for workload, represented in the αand β-bands.In the parietal region, the activation degree of distraction was the highest in the θband compared with other mental states.We showed the scalp topographies according to the spectral bands of EEG signals across the representative subject (S6), as shown in Fig. 4.
The scalp topographies represented the grand-average band power to visualize the brain activation of each mental state.Fig. 4(a) and (b) indicated the scalp patterns corresponding to the classification in Group1 and Group2, respectively.The amplitude was calculated for all EEG channels we used and for each frequency band (δ-, θ -, α-, and β-bands).The scalp topographies in the figures showed that the amplitude was significantly different for each spectral band and brain region in both Group1 and Group2.
As shown in Fig. 4(a), the amplitudes of scalp distribution between the NS data in each AbS (fatigue, workload, and distraction) dataset and each AbS data had statistically significant differences in the θ -and α-bands applying the paired t-test (p < 0.05).The amplitude of the θ -and α-bands in the occipital region increased in fatigue.In the case of workload, the amplitude of the θ -band increased in the frontal region, but the amplitude of the α-band decreased in the occipito-parietal region.In addition, for distraction, the amplitude of the θ -band in the centro-parietal region increased, and that of the α-band in the central region increased.However, we could not find the statistically significant differences for the amplitudes of scalp distribution in the δand β-bands.
Fig. 4(b) for Group2 represented spatial tendencies similar to Group1.When fatigue increased (state: HF), the amplitude of the θ -and α-bands in the occipital region increased.When the workload level increased (state: HW), the amplitude of the θ -band increased in the frontal region, but the amplitude of the α-band decreased in the occipito-parietal region.Furthermore, when distraction increased (state: HD), the amplitude of the θ -band in the centro-parietal region increased, and that of the α-band in the central region increased.However, we could not find any particular patterns in the δand β-bands.
Additionally, we estimated the PSD of the δ-, θ -, α-, and β-bands to calculate the values of indicators-I1: α/β, I2: θ /β, I3: (θ +α)/β, and I4: (δ+θ )/(α+β).The PSD was calculated for each of the four spectral ranges based on the fast Fourier transform (FFT) [55] PSD where f 1 and f 2 are lower frequency and upper frequency, respectively.x(2π f ) was obtained by the FFT. 10 * log 10 (•) denotes unit conversion from microvolts to decibels.The PSD in each frequency for each brain region is presented in the supplementary document.We calculated the indicators in each state using representative channels of brain regions [48].
The significant indicators were calculated by selecting eight out of ten subjects.Of the total subjects, the lower threshold was set when each classification accuracy is below the value of the maximum difference between the average accuracy (0.68) and Std.(0.05) (e.g., ≤0.63).Therefore, the two subjects (S1 and S10) were excluded as outliers beyond the threshold (lower 20% threshold).Through Table IV, we found interesting tendencies among various indicators.In the case of LF and HF, the I4 at the whole region was the highest in all subjects in common, but in the case of other AbS, we could not find the highest indicator in common for all subjects.In addition, in LF and HF, the I1 at the parietal region was the highest in all subjects except S6.Also, in the case of LD, the I4 at the whole region was the highest in all subjects except S3, and in HD, the I4 at the whole region was the highest in all subjects except S5.
We focused to find out the significant indicators for accurately detecting specific mental states.Simply the fact that the value of the indicator is high is not meaningful, and we regarded some indicators that showed the statistically significant differences between each mental state as the significant indicators.Hence, we applied the paired t-test with We found nine indicators that had the statistically significant differences (p<0.05), as shown in Table V.The I1 at the whole and central regions were the significant indicators to classify between the NS and Fatigue, and the values were 0.0168 and 0.0032, respectively.In the case of distinguishing between the NS and Workload, the I1 at the whole region was the most significant as 0.0414.The I1 at the central and parietal regions and the I3 at the parietal region were the significant indicators to classify LW and HW, and the values were 0.0374, 0.0286, and 0.0332, respectively.In the case of distinguishing between LD and HD, the I2 at the whole and central regions and the I4 at the central region were the most significant as 0.0396, 0.0330, and 0.0212, respectively.In the case of classifying between the NS and Distraction, various types of indicators could not show the statistically significant differences, but a similar indicator existed.The I1 at the whole region was similar to the indicators which had the statistically significant differences, and the value was 0.0543.The word of "similar" indicates that the p-value is below 0.055.From this point of view, in the case of distinguishing between LD and HD, the I3 and I4 at the whole region and the I4 at the parietal region may additionally be used as indicators.

IV. DISCUSSION
This study shows the possibility of classifying multiple specific mental states using only brain signals.The proposed MentalNet classified a total of seven human mental states (namely, NS, LF, HF, LW, HW, LD, and HD) with high performances.Although few investigators have focused on classifying the AbS, our research is the first time to classify the states by separating low and high levels even within one AbS category.This may present the possibility of delicately providing feedback to users depending on the low and high levels of the AbS in a real-world environment.In addition, we calculated the indicators, which are mainly evaluated in fatigue [39], [40], [56], [57], and we showed that effective indicators for each region of the brain can be used not only for fatigue but also for workload and distraction.Hence, we believe that it can be the significant contribution to future research in detecting human mental states.Furthermore, since the model was designed using large amounts of data obtained from real pilots, it is possible to contribute to the development of autonomous driving or autopilot based on artificial intelligence technology in the future.
We presented not only the detection of the AbS but also the classification of more specific AbS by dividing each AbS into two levels.To this end, we designed the hybrid deep neural networks with five convolutional blocks and two LSTM layers.We applied the five convolutional blocks for reflecting both temporal and spatial characteristics that exist in EEG signals.As shown in Fig. 4(a), the theta band is the minimum frequency at which differences between the NS and each mental state (NS versus Fatigue, NS versus Workload, and NS versus Distraction) become apparent.To ensure a sufficient receptive field for confirming the theta band, we set the filter size of convolutional blocks I-III to 1×5.Additionally, we set the filter size to 5×1 and 3×1 in convolutional blocks IV and V, respectively, in order to focus on the interchannel relationships of neighboring channels.Recurrent neural networks, such as the LSTM network, are one of the effective deep network architectures for recognizing mental states based on physiological signals.Since the mental states change sequentially over time rather than representing the significant features in a short length of time, we applied the LSTM network that can extract the significant temporal features.Our proposed MentalNet could effectively extract the significant features (spectral, spatial, and temporal features).We utilized the various performance metrics (accuracy, precision, recall, and F1-score) for evaluating the performance fairly.Especially, in the case of precision, as shown in Fig. 4, the amplitudes of scalp distribution in high AbSs (HF, HW, and HD) are significantly higher than those in low AbSs (LF, LW, and LD).Also, since the channels with the statistical significance exist to the amplitudes of scalp distribution in high AbSs, the precision values in high AbSs are higher than those in low AbSs.
We compared the performances using five conventional models, and our proposed model showed the highest classification performances in Group1-3.In addition, when comparing the performance of five conventional models, the CNN-II showed the highest performance in Group1, and the CNN-IV showed the highest performance in Group2 and Group3.The CNN-II is a model made deeper in the spatial filter part of the DeepConvNet, and the CNN-IV is a model which has deep both spatial and temporal filters.Through these results, the temporal features play an important role when classifying more specific AbS, and because of this fact, our proposed model showed the highest performance compared with various conventional models.
Most of the studies using indicators in various mental states are fatigue-related studies.We focused on this point and planned to use various indicators for our study.For this reason, we estimated the indicators using not only the fatigue dataset but also the workload and distraction dataset.In addition, since the fact that the indicators show high values does not mean meaningful indicators, we found the statistically significant indicators by conducting the paired t-test with Bonferroni's correction on the difference in indicator values between each mental state.We found nine statistically significant indicators in four out of six groups.In the NS versus Fatigue group, the I1 at the whole and central regions showed the statistically significant differences, and in the NS versus Workload group, the I1 at the whole region represented the statistically significant difference.Additionally, in the LW versus HW group, three of sixteen indicators showed the statistically significant differences, which were the I1 at the central and parietal regions and the I3 at the parietal region.Also, in the LD versus HD group, three indicators showed the statistically significant differences, which were the I2 at the whole and central regions and the I4 for the central region.Through these results, we have demonstrated the possibility that the four indicators (I1, I2, I3, and I4) can be used in various mental state studies.

A. Conclusion
Accurate detection of various mental states is one of the important challenging issues.The conventional studies have attempted to accurately detect the AbS using various physiological signals.In particular, detection for each category of the AbS using EEG showed high performance, but classification among AbS within various categories still showed limitations.In this work, we proposed a model that could classify various AbS.Our proposed MentalNet could distinguish and classify low and high levels in-depth for each status category and give appropriate feedback to the subjects.Also, we demonstrated for the first time what are the significant indicators for detecting each AbS.Previous studies using indicators in the mental state were fatigue-related studies, but we found indicators that can be used in various AbS.Furthermore, since the dataset was collected for highly disciplined pilots, we believe that it will contribute significantly to future autonomous driving or autopilot advances.

B. Future Works
There are still some issues that remain.First of all, the subjects in this study were conducted on pilots who actually Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.operate airplanes.Therefore, the subjects proceeded as if they were being trained to aviate the aircraft, which made it really difficult to collect the abnormal conditions owing to highly disciplined pilots remaining in the NS consistently.Thus, we believe that our dataset is valuable and aim to construct a large-capacity database of EEG signals by complementing it with paradigms that will better induce the AbS in the future.Hence, we will plan to develop a user-independent system for applying to all subjects.
In addition, in the BCI domain, there are differences in brain signals obtained from males and females, but our dataset is not gender balance, so we plan to construct a gender-balanced dataset in the future.If our model is conducted on the genderbalanced dataset, it will be better applicable to a real-world environment.In a real-world environment, it is possible that several types of mental states can occur simultaneously.To further utilize our technology in a real-world environment, we will design an experimental paradigm that induces complex mental states and improve our MentalNet to be able to classify complex mental states.In the end, it would be valuable to analyze our dataset using various novel feature extraction methods based on graph theory [58] since they could provide reliable features in distinguishing cognitive skills and emotional states.

Fig. 1 .
Fig. 1.Experimental environment for signals acquisition under a simulated flight environment.

Fig. 2 .
Fig. 2. Experimental paradigms for inducing pilot's various mental states in a simulated flight environment.(a) For inducing fatigue, the subjects performed a monotonous flight during night time.(b) In workload, various flight instructions were given to the subjects to induce the change of the pilot's workload by the instructor according to levels 1-3.(c) For distraction, a part of prerecorded sentences (length: 4-22 words) of the ATC message was presented in a regular time interval.NS indicates the NS.

Fig. 3 .
Fig. 3. System overview for the proposed MentalNet-based AbS detection.EEG signals were measured under a simulated flight environment.The mental states were defined as the NS, low fatigue, high fatigue, low workload, high workload, low distraction, and high distraction.

Algorithm 1 :
Training Procedure of the MentalNet • Input: Preprocessed EEG data X = {x i } D i=1 , {x i } ∈ R C×T : Training data for mental states, where D is total number of trials with C the number of channels and T is time points = {O i } D i=1 : class labels, where O i ∈ {NS, HF, LF, HW, LW, HD, LD} and D is total number of trials • Output: Trained MentalNet • Step 1: Pre-training the model for NS and AbS 1 Input X bin : a training EEG data 2 Input = {O bin } D bin=1 : binary class labels, where O bin ∈ {NS, AbS}, D is total number of trials 3 The parameters of pre-training model are initialized to random values and modify the class labels to binary values (NS and AbS) 4

6 2 :
Output X bin : network weights and loss values (binary) • Step Training the model for the specific AbS 7 Input X bin : a set of features from pre-trained model 8 Input = {O tr } D tr=1 : multi-class labels, where O tr ∈ {HF, LF, HW, LW, HD, LD}, D is total number of trials 9 The parameters of model are initialized to random values and modify the class labels to multi-class values (HF, LF, HW, LW, HD, and LD) 10

12 Output
X N : network weights and loss values (multi-class) • Step 3: Fine-tune parameters 13 Minimizing the loss values by tuning parameters of both binary and multi-class models were 128 and 64, respectively.The last fully connected layer's output is fed to 2-, 6-, or 7-way softmax which produced a class label distribution over 2-class (NS and AbS), 6-class (LF, HF, LW, HW, LD, and HD), or 7-class (NS, LF, HF, LW, HW, LD, and HD).The overall training procedure of the MentalNet is summarized in Algorithm 1.

TABLE I SPECIFICATIONS
OF THE PROPOSED MODEL: DETAILS OF PARAMETERS AND LAYER IMPLEMENTATIONS

TABLE II PERFORMANCE
MEASUREMENTS (ACCURACY, PRECISION, RECALL, AND F1-SCORE) ACCORDING TO EACH MENTAL STATE BY USING THE MENTALNET TABLE III COMPARISON OF THE CLASSIFICATION PERFORMANCES WITH THE STATISTICAL ANALYSIS FOR DETECTING VARIOUS MENTAL STATES AMONG THE CONVENTIONAL MODELS AND THE PROPOSED MODEL.THREE DIFFERENT EVALUATION GROUPS WERE DEFINED; Group1 (2-CLASS): NS AND ABS, Group2 (6-CLASS): LF, HF, LW, HW, LD, AND HD, AND Group3 (7-CLASS): NS, LF, HF, LW, HW, LD, AND HD

TABLE V REPRESENTATION
OF THE SIGNIFICANT INDICATOR BETWEEN BINARY MENTAL STATES FOR EACH BRAIN REGION (WHOLE (wh.), CENTRAL (cent.),TEMPORAL (temp.),AND PARIETAL (pari.)REGIONS) BY USING THE STATISTICAL ANALYSIS