Continuous EEG Decoding of Pilots’ Mental States Using Multiple Feature Block-Based Convolutional Neural Network

Non-invasive brain-computer interface (BCI) has been developed for recognizing and classifying human mental states with high performances. Specifically, classifying pilots’ mental states accurately is a critical issue because their cognitive states, which are induced by mental fatigue, workload, and distraction, may be fundamental in catastrophic accidents. In this study, we present an electroencephalogram (EEG) classification of four mental states (fatigue, workload, distraction, and the normal state) from EEG signals in both offline and pseudo-online analyses. To the best of our knowledge, this study is the first attempt to classify pilots’ mental states using only EEG signals during continuous decoding. We recorded EEG signals from seven pilots under various simulated flight conditions. We proposed a multiple feature block-based convolutional neural network (MFB-CNN) with temporal-spatio EEG filters to recognize the pilot’s current mental states. We validated the proposed method for two analyses across all subjects. In the offline analysis, we confirmed the classification accuracy of 0.75 (±0.04). Also, in the pseudo-online analysis, we obtained the detection accuracy of 0.72 (±0.20), 0.72 (±0.27), and 0.61 (±0.18) for fatigue, workload, and distraction, respectively. Hence, we demonstrate the feasibility of classifying various types of mental states for implementation in real-world environments.


I. INTRODUCTION
Brain-computer interface (BCI) is one of the innovative technologies used for communication between humans and devices by recognizing users' status and intention. In particular, non-invasive BCI, one of practical BCI techniques, has attracted a lot of attention owing to its non-invasive nature and low cost [1]- [7]. Hence, non-invasive BCI systems to control a robotic arm [8]- [10], a wheelchair [11], and a BCI speller [5], [12], [13] have been developed. Recently, the noninvasive BCI has been investigated for recognizing human mental states, such as fatigue [14]- [18], workload [14], [19]- [21], and distraction [22]- [24], with robust detection The associate editor coordinating the review of this manuscript and approving it for publication was Filbert Juwono .
performances. Especially, classification of each state such as fatigue or not, workload or not, and distraction or not for autonomous driving systems (e.g., vehicles and aircraft) using physiological signals has been developed for a high level of artificial intelligence (AI) techniques [15], [25]. Hence, recent BCI advances have been focused on the classification of the users' mental state because it can directly reflect the actual condition of human mental states [26], [27].
Fatigue is commonly caused by a prolonged cognitive task, especially a repetitive or boring task [28], [29]. According to the British airline pilots' association (BALPA), 56% of the approximately 500 commercial pilots responded that they usually fell asleep during a flight. In addition, mental workload is defined as the required mental cost to achieve a given task [14]. Task difficulty and mental workload of VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ pilots could be influenced by various interruptions during flight journey. The division of national aeronautics and space administration (NASA) for human factors in aerospace published research indicating that as the pilot's workload increased, the ability to deal with flight-based information decreased [30]. Also, distraction occurs when pilots could not concentrate on the flight task. Civil aviation authority (CAA) of New Zealand reported that distraction during flight is the major reason for errors leading to accidents [31]. Recent BCI studies have investigated the classification of each state using various physiological signals, such as those captured by electrooculogram (EOG), electrocardiogram (ECG), functional magnetic resonance imaging [32], functional near-infrared spectroscopy (fNIRS), and electroencephalogram (EEG) signals. Hong et al. [33] detected drowsy state utilizing machine learning methods with various physiological signals. The highest performance was 0.98 kappa values using random forest method. Liu et al. [19] used EEG signals and fNIRS recorded during the n-back task to evaluate workload. The highest accuracy of 65.1% was observed when fusing all information. Horrey et al. [23] recognized distraction using near-infrared spectroscopy, heart monitoring, and eye-tracking systems. They obtained the highest accuracy of 93.3% when unheard material was rejected, and this method helped across 31 participants.
A few groups have conducted experiments using only EEG signals for classifying each mental state. Gao et al. [15] accurately detected fatigue utilizing a spatial-temporal convolutional neural network (CNN) model from EEG signals. They constructed a deep artificial neural network with 14 layers and used a relatively small number of EEG data while maintaining high performances. In addition, the discriminative point compared to other studies was that the number of parameters could be reduced by sharing the weight parameters among core blocks, and this was the critical factor for improving performances. Blanco et al. [20] evaluated workload using EEG signals. They compared binary classification using k-nearest neighbor (kNN), linear discriminant analysis (LDA), naïve Bayes, decision trees, and support vector machine (SVM), and the highest accuracy of 90.17% was observed using LDA. Sonnleitner et al. [22] recognized distraction from EEG signals using regularized LDA. They used two tasks for their experiments: a driving task and auditory secondary task. They achieved approximately 8% classification error across 20 subjects.
However, these studies have focused on binary classification for each mental state. Furthermore, most recent studies have not investigated continuous decoding for the direct application of accident prevention yet.
In this study, we designed experimental paradigms for inducing specific mental states under various simulated flight conditions and also conducted both offline and pseudo-online analyses for classifying mental states. Additionally, we proposed a multiple feature block-based convolutional neural network (MFB-CNN) using temporal-spatio EEG filters to accurately classify fatigue, workload, distraction, and the normal state using only EEG signals. We evaluated the proposed method for the offline and pseudo-online analyses across all subjects. We could confirm the feasibility of classifying various mental states for implementation in real-world environments.

A. SUBJECTS
Seven healthy subjects (S1-7, 6 males and 1 female, aged 27.8 (±1.4)) with over 100 hr. of flight experience in Taean Flight Education Center, an affiliated education organization of Hanseo University, participated in our experiment. The experimental environment and protocols were reviewed and approved by the Institutional Review Board at Korea University [1040548-KU-IRB-18-92-A-2]. Before the experiment, all subjects were informed about the experimental protocols and consent according to the Declaration of Helsinki. They were asked to refrain from drinking alcohol and coffee and to sleep for 6∼8 hr. the day before the experiment. At the end of the experiment, the subjects were instructed to complete the questionnaires regarding the recognition of their status and evaluation of the paradigms. Fig. 1(a) shows the experimental environment for acquiring the pilots' EEG and EOG signals. The simulator system presented the flight environment using Cessna 172 aircraft. The environment included a cockpit, screen, wireless keypad, and signal amplifier. The cockpit consisted of a monitor display, flight yoke, and control panels. The screen offered a 210 • view so that the pilots could see outside the aircraft. A wireless keypad was installed on the flight yoke for recording the pilots' response as the input. We used the signal amplifier (BrainAmp, Brain Products GmbH, Germany) to measure EEG and EOG signals [34]. We set up the sampling frequency to 1,000 Hz and a 60 Hz notch filter was used to remove DC noise.

C. EXPERIMENTAL PARADIGMS
We designed the experimental paradigms to effectively induce pilots' fatigue, workload, and distraction based on the conventional works [15], [22], [31], [35]. The pilot's mental state-related EEG signals were recorded during simulated flights based on our experimental paradigms.

1) FATIGUE
The subjects performed a monotonous nighttime flight with the initial setting values of 3,000 feet (height), 0 • (heading), and 100 knots (speed) for 1 hr. to induce drowsiness. The subjects entered the Karolinska sleepiness scale (KSS) value, a significant index of subjective drowsiness level [36], every minute using the keypad. KSS consists of nine levels (1: extremely alert, 2: very alert, 3: alert, 4: rather alert, 5: neither alert nor sleepy, 6: some signs of sleepiness, 7: sleepy, but no difficulty remaining alert, 8: sleepy, some effort to keep alert, 9: extremely sleepy). We considered level 1 to 6 as the normal state and level 8 to 9 as fatigue [37]. Level 7 was excluded to get a clearer distinction between the groups. See Fig. 2(a) for more details.

2) WORKLOAD
To induce a change of workload, flight instructions were given by an instructor who sat next to the pilots. The instructions consisted of three levels based on the complexity of the given task; 1: flight with the maintenance of given altitude and disregard for heading and velocity, 2: flight with the maintenance of continuously given altitude, heading, and velocity, 3: steep turn with given bank angle and the maintenance of velocity toward given heading. The experiment consisted of 15 trials in level 1, 10 trials in level 2, and 10 trials in level 3. The label decision for 'Pass' and 'Fail' was marked by the instructor in the Rest interval ( Fig. 2(b)).

D. MFB-CNN
We proposed a CNN-based deep learning framework with temporal-spatio EEG filters to classify four mental states in the real-time environment as shown in Fig. 3. We adopted multiple feature blocks using various EEG features such as the spectral, temporal, and spatial information [38], [39]. For example, depending on the filter size, the CNN architecture could extract both temporal features and spatial information [39]. In addition, we designed a deep convolutional neural network owing to the characteristics of long EEG recording sessions compared to other BCI paradigm. Table 1 indicates the structure of the proposed MFB-CNN framework in detail.
Initially, EEG signals were down-sampled from 1,000 to 100 Hz and a bandpass filter was applied with 1-50 Hz using zero-phase, 2nd order, Butterworth filter. To obtain high-quality EEG data, we applied an independent component analysis (ICA) [40] to remove the contaminated components generated by unnecessary pilots' body movements, such as eye blinks and movement of the head. ICA is simply one of the preprocessing steps to obtain clean EEG data. Since EEG signals are vulnerable to noise, most studies using deep learning or machine learning to detect user intentions or  decode mental states have proceeded with feature extraction or classifier training after removing artifacts. Therefore, since our experimental environment was simulated flight where a lot of artifacts could occur; first, artifacts were removed from EEG signals, after which the training set and the test set was divided, finally the deep learning framework proceeded [41]- [45]. 4 EOG channels were used as the contaminated reference signals to reject the independent component (IC). All signal processing was conducted using a MATLAB 2019a software with a BBCI toolbox [46].
After the pre-processing, EEG data were composed of the total number of EEG channels and sampling points (30 × 300) as input. Seven convolutional blocks were incorporated into a hierarchical CNN for the deep application of high-level features. Each convolutional block had three or four convolution layers and a batch normalization layer with a size of 32. The filter size for extracting temporal features was 1 × 5 and the stride size was 1 × 1. In blocks VI and VII, the filter sizes were 5 × 1 and 3 × 1, respectively, to consider the spatial information. Convolutional blocks V and VII were set with a 0.5 dropout ratio to solve the overfitting problem. Max-pooling and avg-pooling layers were used to reduce the number of output features while minimizing the loss of information. We applied the exponential linear units (ELUs) as an activation function in convolutional block VII. The output function is defined as: For model training, we performed 30 iterations and the classificati on accuracies were evaluated on the test data.

E. CLASSIFICATION PERFORMANCE EVALUATION 1) OFFLINE ANALYSIS
For the offline analysis, we evaluated the classification accuracies based on the proposed MFB-CNN architecture for four different mental states across each subject. First, we randomly divided the entire dataset into a training set and a test set using 50:50 proportion. For a fair evaluation of classification accuracy, the cross-validation method was used. We evaluated the classification accuracies using the true label of samples and classification output each time.
In all paradigms, the data for each trial (1 min.) were segmented into 3 sec. data without overlap in the offline analysis as shown in Fig. 4(a). In fatigue experiment, subjects evaluated their condition themselves using KSS, subjectively. Since KSS could not be sometimes entered in a drowsy sate occurred, each number of data samples have differed according to subjects. For example, S1 obtained 687 samples for fatigue and 427 samples for the normal state. In contrast, S2 recorded 482 samples for fatigue and 662 samples for the normal state. Each subject had 900 samples and 360 samples in workload and distraction experiments, respectively. Across all subjects, we could obtain 4,896 samples for fatigue, 6,300 samples for workload, 2,520 samples for distraction, and 3,144 samples for the normal state.
In the case of fatigue, we used 2,448 samples when conducting training and test, respectively. In terms of workload state, 3,150 samples were used when training and testing our proposed framework, respectively. In the case of distraction, 1,260 samples were used and, in the normal state, 1,572 samples were used during training and testing our model, respectively. We divided data samples evenly into 50:50 proportion for each mental state ( Fig. 4(a)).

2) PSEUDO-ONLINE ANALYSIS
We also evaluated the proposed MFB-CNN in the pseudo-online analysis to confirm the feasibility of continuous decoding for human mental states. In this study, due to the low commercialization of BCI technique and the environmental constraints of pilots' driving, the evaluation has been conducted through the pseudo-online analysis, which is similar to a real-time experiment. Generally, in this specific environment, most BCI investigators have adopted pseudo-online analysis instead of fully online tests for proving the possibility of real-time scenarios [47]- [49]. Therefore, we also confirmed the feasibility of continuous decoding using MFB-CNN through offline analysis first, then we re-trained MFB-CNN using all dataset assuming general calibration data for a pseudo-online analysis [49]. We conducted the pseudo-online analysis with a classifier that classified four mental states in the offline analysis.
A sliding window of 3 sec. duration with an overlap of 0.5 sec. was applied for continuous classification output as shown in Fig. 4(b). Each window was also processed with the same preprocessing step as shown in Fig. 3, and the parameters of the model for classification were used throughout the training phase. The classification results of the pseudo-online analysis were evaluated for each sub-trial between the true label of samples and predicted classification output.
We used 13,800 samples when conducting training and test, respectively, in the case of fatigue. Also, in terms of workload, 20,825 samples were used when training and testing our model, respectively. In the case of distraction, we used 23,954 samples when conducting training and test and, in terms of the normal state, 8,908 samples were used when training and testing our model. We fairly selected the 8,908 samples from each mental state same as the offline analysis so that we used a total of 35,632 samples when conducting training and test. Table 3 shows the overall classification performances for each subject in the offline analysis. We applied a 2-fold VOLUME 8, 2020 cross-validation method to evaluate classification accuracy fairly; 50% of the samples were randomly selected for training and the remaining 50% were reserved for validation. Also, we repeated the two-fold cross-validation four times after adopting a different shuffle order in each time [50]. We obtained the grand-average accuracy of 0.75 (±0.04) for decoding the four mental states. S6 showed the highest classification accuracy of 0.79, and S4 showed the lowest classification accuracy of 0.71. Table 4 represents the comparison of classification accuracies between the conventional methods and the proposed network in the offline analysis. The conventional methods used for performance comparison were power spectral density-SVM (PSD-SVM) [51], PSD-kNN [20], channel-wise CNN (CCNN) [52], and deep long short-term memory (LSTM-D) [53]. PSD-SVM extracted the PSD of the theta, alpha, and beta bands while neglecting the spatial information of EEG signals as features, and SVM was used for the classification [51]. Similarly, PSD-kNN extracted the PSD of the theta, alpha, and beta bands. After extracting the PSD as a feature, it trained using the kNN with the number of neighbors set at 5 [20]. CCNN used a channel-wise filter to reflect the spatial information of EEG signals for classification [52]. LSTM-D method was comprised of a sequenceto-sequence and many-to-one LSTM layers [53]. The comparison of classification accuracies showed that our proposed MFB-CNN with temporal-spatio EEG filters had the highest accuracy for classifying various mental states. To verify the classification performance difference between the conventional methods and the proposed model, we applied the analysis of variance (ANOVA) with the Bonferroni correction for multiple comparisons. Initially, we validated the normality and homoscedasticity owing to a small number of samples. The normality for each conventional method applying the Shapiro-Wilk test was satisfied with a null hypothesis (H0), and the assumption of homoscedasticity based on Levene's test was also met for each group. Hence, we conducted a statistical analysis between the conventional methods which satisfied these conditions and MFB-CNN using multiple comparisons with a Bonferroni correction. MFB-CNN had the most statistically significant difference in performance among the machine learning method such as PSD-SVM and PSD-kNN with p-value below 0.001.

A. CLASSIFICATION PERFORMANCES IN THE OFFLINE ANALYSIS
In addition, we also evaluated the degree of confusion across each subject. Fig. 5 shows the confusion matrices when classifying the four mental states using MFB-CNN for all subjects in the offline analysis. Each column of the matrix contains the target states and each row represents the predicted states. The representative subject (S1) had the highest true positive rate (TPR) for detecting fatigue, and the value was 0.99. The subjects S2 and S6 had the highest TPR for detecting workload, and the values were 0.91 and 0.97, respectively. Subject S3 had the highest TPR for decoding distraction of 0.86. S4, S5, and S7 had the highest TPR for recognizing fatigue, and the values were 0.94, 0.99, and 0.81, respectively.
Through the experimental results, we confirmed that the results for all subjects had some common characteristics. When the target state was the distraction, the rates of misclassification of the predicted state as fatigue or workload were 0. Similarly, when the target label was fatigue, the rates of misclassification of the predicted label as distraction or workload were 0. Additionally, when the target state was workload, the rates of misclassification of the predicted state as fatigue or distraction were 0. Finally, most of the subjects could not detect the normal state correctly. The TPR for detecting the normal sate was the lowest accuracy, in other words, most subjects confused their state as the normal state.

B. CLASSIFICATION PERFORMANCES IN THE PSEUDO-ONLINE ANALYSIS
We obtained the evaluation results of decoding the mental states applying the One-Versus-Rest (OVR) strategy. We conducted an evaluation between one of the mental states and rest states. Unlike the typical BCI research, in the research of mental state detection, the recording session of each mental state is different because it takes a long recording time to obtain data related to the state change. If we artificially concatenate data of recording sessions for pseudo-online analysis, continuous decoding of mental states could not be conducted at the concatenated data points owing to not formed continuous EEG signals. Therefore, we applied the OVR strategy as state detection and used the trained classifier   Table 5. In addition, S1 showed significantly high accuracy in terms of detecting workload and distraction. Similarly, S2 and S4 represented high performance for detecting fatigue. Also, S6 showed high accuracy for detecting workload. Table 6 shows the comparison of classification accuracies between the conventional methods and MFB-CNN. We used the same conventional methods used in the offline analysis. The comparison of classification accuracies showed that MFB-CNN had the highest accuracy for detecting each TABLE 6. Comparison of detection accuracies between the conventional methods and MFB-CNN in the pseudo-online analysis. CCNN is a specialized method that considers spatial information. The accuracy of our proposed model is higher than that of CCNN because MFB-CNN considers not only spatial information but also temporal information. LSTM-D is a specialized method for considering sequential information. However, MFB-CNN has higher accuracy than LSTM-D because it considers both spatial and spectral information as well as sequential information. Fig. 6 represents the detection output for each state during the pseudo-online experiment. We used a 3 sec. sliding window with an overlap of 0.05 sec.. Each gray square indicates the detection results for the class of the corresponding mental state. The white squares represent the detection output for the rest states. S2, S7, and S6 showed relatively clear results for fatigue, workload, and distraction, respectively, compared with the other subjects.
In case of workload, we found out that the majority of sliding windows were classified as rest states in a few number of trials. This is considered to be a sufficient result when the task is entered as a failure since the instructor sitting next to the subject determines whether the result of each task is 'Pass' or 'Fail'.
In terms of distraction, we confirmed that the early parts of many trials were classified as rest states. Before we started the distraction experiment, we fully explained to subjects how to experiment and transmitted the precautions (minimize the movements of body and eye, inwardly count the number of words in the ATC messages without using fingers, and etc.). However, in the early part of the experiment, subjects had difficulty performing the experiment and counted the number of words in the ATC messages not inwardly.

C. TRAINING CONVERGENCE CURVE FOR MFB-CNN
To observe the convergence process of MFB-CNN, the accuracy change curve of the representative subject (S2), which measures the percentage of the correctly classified samples in the training set along with epochs is shown in Fig. 7. For all subjects, the training process converges within about 10 epochs. Our proposed model showed that it could train using a small number of epochs. Additionally, we obtained shorter computational times for model training (40.76 sec.) and testing (0.0029 sec.), respectively, compared with other conventional methods.

D. NEUROPHYSIOLOGICAL ANALYSIS
Moreover, we have tried to analyze the neurophysiological phenomenon using temporal, spectral, and spatial information of EEG features according to each mental state (fatigue, workload, and distraction). The proposed MFB-CNN was designed to consider the neural information using the temporal-spatio filters. Fig. 8 shows the grand-average spectrograms of mental states for each channel according to the spectral bands (the delta, theta, alpha, and beta bands) of EEG signals. The spectrograms represent the grand-average power to visualize brain activation. Each column of the matrix reflects the frequency and each row represents the channels. The PSD was computed for all EEG channels and each frequency band across all subjects. The spectrograms show that the PSD of each spectral band and each channel were hardly different among mental states. Fig. 9 represents the grand-average PSD of each spectral band of EEG signals corresponding to four mental states. Each mental state has the highest PSD of a particular band in a certain brain region. The highest PSD of a particular band means the highest activation degree for a specific mental state in that band. In the temporal region, the activation degree of fatigue was higher in the delta, theta, and alpha bands compared with that of other mental states. In the central region, the notable highest activation degree was observed for workload, represented in the alpha and beta bands. In the parietal region, the activation degree of distraction was the highest in the theta band compared with other mental states. In addition, we could confirm the statistical significance through the ANOVA with the Bonferroni correction. In specific, at the temporal region, 'fatigue' vs. 'the other mental states (workload, distraction, and the normal state)' has a statistically significant difference in the delta band (p < 0.001). Hence, the delta band at the temporal region has meaningful spatial and spectral information when detecting 'fatigue' among various mental states. Also, at the temporal region, 'fatigue' vs. 'normal' has the statistical significance in the theta band (p < 0.001). Additionally, at the central region, 'the normal state' vs. 'the abnormal states (fatigue, workload, and distraction)' has a statistically significant difference in the theta and alpha bands (p < 0.001).

IV. DISCUSSION
In this paper, we demonstrated the feasibility of continuous decoding of various mental states (fatigue, workload, distraction, and the normal state) based on a deep learning method in a pilot environment. The proposed MFB-CNN was based on deep CNN architecture using temporal-spatio EEG filters related to the pilots' mental states. To the best of our knowledge, this study is one of the novel attempts to conduct continuous decoding of various mental states with robust classification performance. VOLUME 8, 2020 Recent conventional works have focused on the classification of user mental states with high performances using advanced machine learning algorithms and deep learning methods such as [54]- [60]. Zhang et al. [54] introduced two deep learning-based frameworks with novel spatio-temporal preserving representations of raw EEG streams to identify human intentions with high performances. The two frameworks consisted of both convolutional and recurrent neural networks effectively exploring the preserved spatial and temporal information in either a cascade or a parallel manner. Also, they reflected the matrix transformation in 2-D data based on the channel location of the brain regions. Bashivan et al. [55] proposed a novel approach for learning such representations from multi-channel EEG time-series, and demonstrated its advantages in the context of mental workload classification task. Their proposed approach was designed to preserve the spatial, spectral, and temporal structure of EEG. In addition, they used topographical maps based on the channel location of the brain regions. In this work, we decoded the pilots' mental states through the method of applying the temporal filter and spatial filter in the preprocessing step for deep learning training.
Accurate detection for mental states of drivers or pilots has been attracted as a critical issue in AI fields owing to the technical advances of autonomous vehicles and the autopilot system. Therefore, some researchers have focused on detecting mental states [15], [17], [22], [25], [61], while a few others have performed the analysis of physiological signals for detecting mental states of users in the aircraft environment [14], [18].
In specific, while aviation accidents do not happen frequently as vehicle accidents, when they occur, they cause far worse casualties and larger explosions. These aviation accidents not only produce immediate effects (e.g., casualties and aircraft explosion) but could induce secondary impacts, such as harming the surrounding environment [62]. Statistically, more than 70% of aviation accidents can be attributed to human factors, such as pilot fatigue; drowsiness may also be an important contributor to a large number of aviation accidents [63]. Therefore, in this study, we focused on analyzing EEG data regarding fatigue, workload, and distraction in simulated flight experience with human pilots. In the field of detecting or classifying human mental states, the pilot environment is a very rare environment. Only volunteers with over 100 hr. of flight experience were allowed to participate in this study. We designed experimental paradigms to induce various mental states. In order to obtain high-quality EEG signals for various mental states, we conducted experiments very strictly, and collecting experimental data was quite challenging owing to the difficulty in constructing the experimental environment. Our experimental data were able to indicate the pilots' mental condition, and we showed that MFB-CNN could contribute to the prevention of large aviation and vehicle accidents in the real world, as it can robustly detect users' mental conditions.
Although direct performance comparison with related studies was impossible due to different experimental protocols and paradigms, we conducted the comparison of classification performances between the conventional methods and MFB-CNN using our experimental data. MFB-CNN showed the highest classification accuracies for both offline and pseudo-online analyses. In the offline analysis, we showed the grand-average classification accuracy of 0.75 (±0.04). In the pseudo-online analysis, we obtained the detection accuracy of 0.72 (±0.20), 0.72 (±0.27), and 0.61 (±0.18) for each fatigue, workload, and distraction, respectively. The reason that the detection of distraction is relatively low is that the subjects tended to count the number of words in the ATC not inwardly although we provided the instruction to inwardly count the number of words in the ATC messages. In aspects of classification accuracy, we could confirm that the proposed MFB-CNN showed a much higher performance of approximately 0.13 than the conventional methods in the offline analysis. Also, in the pseudo-online analysis, we could confirm that MFB-CNN presented a higher detection accuracy of 0.11, 0.10, and 0.09 for each mental state, respectively. Also, it reached high training accuracy using only a small number of epochs as depicted in Fig. 7.
In addition, the standard deviation of the subject's performance in the offline analysis presented the lowest value when using MFB-CNN compared with other existing methods. We could confirm that the standard deviation of the grand-average performance using machine learning methods such as PSD-SVM and PSD-kNN tends to have a high variation of performance among the subjects (0.09). On the other hand, the classification performance using deep learning methods such as CCNN and LSTM-D presented low performance variation tendencies across all subjects (0.06). Among deep learning methods, the standard deviation of MFB-CNN was the lowest, and the value was 0.04. However, the assumption of homoscedasticity based on Levene's test could not be met since the differences between the standard deviation of machine learning and deep learning methods showed high values. Hence, we could not adopt statistical multiple comparisons between machine learning and deep learning methods. Also, we represented the grand-average spectral peak of each band power using the PSD for mental states as shown in Fig. 9. At the temporal region, the delta band can be used to classify the 'fatigue' vs. 'the other mental states'. Moreover, at the temporal region, the theta band can be used to classify the 'fatigue' vs. 'the normal state'. Finally, at the central region, the theta and alpha bands can be used to classify the 'the normal state' vs. 'the abnormal states'.
MFB-CNN architecture is composed of several convolution blocks compared to the existing deep learning architectures in the BCI field; therefore, in the model training, the architecture has high computation cost. Owing to the limitation, the current MFB-CNN architecture could be difficult to applying real-world BCI scenarios. Since the long calibration time makes the subjects could induce fatigue states and spread low attention state. Therefore, to classify mental states more accurately and rapidly in real-world BCI scenarios, we plan to modify MFB-CNN architecture with a shallower design but maintaining classification performances [15], [39], [64]; and adapt a weight-parameter sharing method [65] and data augmentation method [66] to the model. As depicted in Fig. 7, we confirmed the possibility of model swallowing since a stable performance was shown despite the small number of convolutional blocks. Additionally, each state activates different brain regions. At present, our proposed model used EEG data by extracting features from the whole brain area (Fig. 9). In order to improve the robustness of its performance, we plan to adopt the principle of a hierarchical deep learning model that can fully utilize the spatial characteristics of each state [65].

V. CONCLUSIONS AND FUTURE WORKS
In this study, we investigated the feasibility of classifying the pilots' various mental states (fatigue, workload, distraction, and the normal state) using only EEG signals. Also, we proposed MFB-CNN for classifying mental states with high accuracy. We obtained EEG data corresponding to the four mental states from seven subjects using our designed paradigms in the simulated flight environment. Our proposed model successfully enhanced classification accuracy of 0.75 for the offline analysis. Also, MFB-CNN improved the detection accuracy of 0.72, 0.72, and 0.61 for each fatigue, workload, and distraction state, respectively, during continuous decoding.
In future works, acquiring randomized sequential modes of each mental state should be considered. Additionally, an inter-session variation of physiological signals should be explored in order to develop a robust real-time BCI system. Hence, we will implement the proposed model in real-world environments.