A WPCA-Based Method for Detecting Fatigue Driving From EEG-Based Internet of Vehicles System

Fatigue driving is the main cause of traffic accidents. Analysis of electroencephalogram (EEG) signals has attracted wide attention for identifying fatigue driving. With the development of the Internet of Vehicles (IoV), we hope to establish an EEG-based IoV traffic management system to improve traffic safety. In the proposed system, real-time diagnosis is a significant factor, and improvement of the detection speed is our main concern. EEG signals generate a large amount of spatially oriented data over a relatively short duration; hence, their dimension needs to be reduced effectively before being analysed. We proposes a feature reduction method, based on a novel weighted principal component analysis (WPCA) algorithm for EEG signals. First, the EEG features are extracted by an autoregressive (AR) model. Second, we calculate the influence of different features on the classified performance of fatigue state. The accuracy reduction values of different features are normalised as the weights of the features. Finally, these weights are assigned to the WPCA to reduce the EEG features. To verify the effectiveness of the algorithm, we carried out a simulated driving experiment involving eight participants. For comparison, power spectral density and differential entropy models were also introduced to extract EEG features. Support Vector Machine was adopted as a classifier to establish a fatigue driving classification experiment. The experimental results show that the WPCA method can effectively reduce the feature dimension of different EEG feature extraction methods, speed up calculations, and achieve a much higher classification accuracy of fatigue driving.


I. INTRODUCTION
The Internet of Things (IoT) is a methodology that can connect objects through the Internet to work together to achieve new goals [1].The Internet of Vehicles (IoV) is an application of the IoT, which can collect information about vehicles and drivers [2].Among this information, fatigue driving has received much attention.
In the past ten years, the number of cars in China has increased dramatically, and the number of traffic accidents has increased [3].According to relevant reports, China has The associate editor coordinating the review of this article and approving it for publication was Mu-Yen Chen.become one of the countries defined as having frequent traffic accidents.Many factors can result in traffic accidents, the main one being fatigue driving.In a fatigued state, drivers tend to be distracted, think less actively, and have slower reactions, all of which increase the possibility of traffic accidents [4].Therefore, it is particularly important to detect driver fatigue state accurately and quickly.Driving fatigue detection, based on physiological information, is an objective means of detecting and identifying driver fatigue state through changes in physiological indicators [5].Studies have shown that the body's fatigue state can be effectively detected and evaluated by electrophysiological information, such as body temperature, blood pressure, electrocardiogram (ECG), electroencephalogram (EEG), and electromyography (EMG).Among them, analysis of EEG signals has accepted wide attention.
EEG reflects the electrophysiological signals of the cranial nervous system [6], and can be used to detect and analyse driver fatigue state.For example, Gao et al. proposed an EEG-based spatio-temporal convolutional neural network to detect driver fatigue [7], whilst a graph analysis method of functional brain network topology (using minimum spanning tree) was proposed by Chen et al. for detecting driver drowsiness [8].Wang et al. developed a novel real-time driving fatigue detection methodology, based on dry EEG signals [9], and Pei et al. proposed a method based on EEG signal analysis to study the fatigue characteristics of drivers of different ages [10].Wang et al. proposed a method for analysing feature fatigue EEG signals based on wavelet entropy [11], whilst Hu et al. proposed a driving fatigue detection method based on EEG signals using fuzzy entropy [12].Min et al. proposed a method based on multientropy fusion of EEG system to detect driver fatigue [13].Wang et al. proposed an EEG-based system for evaluating driver fatigue with only one electrode by ensemble learning [14].Pathak and Jayanthy designed a portable, low-cost brain-computer interface drowsiness-detection system [15].[20].
EEG signals generate a large amount of spatially oriented data over relatively short durations, which leads to a big data problem.Big data requires secure storage and high computing resources for real-time processing [21]- [23].To increase the processing efficiency, and satisfy the real-time requirement more effectively, the dimension of EEG signals need to be reduced properly.Principal component analysis (PCA) is a commonly used dimension reduction method.It can analyse the main influencing factors from multiple contexts, reveal the essence of entities, and simplify complex problems.Liu et al. proposed a hybrid dimension featurereduction scheme using 14 different features extracted from EEG recordings [24].To reorder the combined features into max-relevance with the labels and min-redundancy of each feature, maximum relevance minimum redundancy (mRMR) was applied.PCA was used to further reduce the generated features for extracting the principal components.Bousseta et al. extracted EEG features by continuous wavelet transform (CWT) and empirical mode decomposition (EMD) [25].PCA was introduced for feature dimension reduction, and the left and right hand motion imaging classification was performed by using a linear and radial basis function (RBF) kernel function with a Support Vector Machine (SVM) classifier.Sun et al. proposed a fusion algorithm based on PCA for nonlinear global features and power spectral entropy [26].Combining the power spectral entropy of EEG and the nonlinear attribute features (such as Hurst index), PCA was introduced for dimension reduction and feature fusion, and SVM was used as a classifier for emotion recognition.Neshov et al. proposed an algorithm, which could identify five psychological tasks using 6-channel EEG data [27].The main aim was to divide the original EEG signals into several frames and calculate their spectrum, apply the Gaussian second derivative to extract features, and use PCA to reduce feature dimension.Li et al. extracted 8 positive and negative emotions from a dataset, representing the data of 14 channels from the different regions of the brain [28].Based on wavelet transform, δ, θ, α, and β rhythms were extracted.On this basis, PCA was used to fuse EEG features of wavelet features, approximate entropy and Hurst exponents, and to reduce feature dimension.Zarei et al. proposed a feature extraction method [29], which was combined with PCA and the cross-covariance technique (CCOV).The algorithm extracted discriminant information from the mental state, based on EEG signals in a brain-computer interface technology application, and applied correlation-based variable selection.
In the application of PCA, predecessors are based on data variance reduction and treat each dimension feature equally.However, different features play different roles in the recognition process; therefore, it is necessary to assign different weights to the feature values [30].Weighted PCA (WPCA) weights the original feature data, and finds the linear combination with the largest variance according to the idea of PCA [31].
With the development of the IoV, we propose an EEG-based IoV traffic management system to improve traffic safety [32]- [37].The proposed system is composed of a fatigue detection system, the IoV, and a traffic management platform, as shown in Fig. 1.
The fatigue detection system includes an electric source imaging (ESI) neuroscan system and a fatigue diagnosis system.The ESI neuroscan system is equipped with 40 electrodes, which are arranged according to the International 10−20 system, with a sampling frequency of 200 Hz.Among the 40 electrodes, in addition to the 4 electrodes as the internal structure, 2 are defined as reference electrodes and 4 (placed across the horizontal and vertical directions) are used to monitor eye movement.Therefore, the remaining 30 electrodes are used to collect EEG signals.The fatigue diagnostic system is a supercomputer equipped with WPCA and SVM algorithms.WPCA algorithm is a dimension reduction method, and SVM is an efficient supervised two-category classifier.
Working principle of the EEG based IoV traffic management system: The system is able to detect the driver's fatigue status in real time.It works online and guarantees processing time.The driver's EEG signals are collected by the ESI neuroscan system and transmitted to the fatigue diagnosis system through a 5G network in real-time.Prior to acquisition, the skin impedance of the EEG electrodes is adjusted to below 5 k by injecting conductive gel.The fatigue diagnosis system reduces the EEG signals using the WPCA algorithm, diagnoses fatigue status using SVM, and uploads the detection result to the IoV.If fatigue is detected, the traffic management platform will remind the driver to either limit their speed or stop and have a rest, whilst warning the surrounding vehicles.Of course, the platform also collects the date, time, location, and frequency statistics of fatigue for each driver.Accordingly, the system can improve traffic safety.
Real-time diagnosis is of significant importance in an EEG-based IoV traffic management system; hence, we need to improve the detection speed.Because EEG is a high-frequency signal, a large amount of data is generated in a short period, meaning that methods to reduce the dimension of EEG signals need to be studied intensively.
In this paper, we propose a feature reduction method based on a novel WPCA algorithm for EEG signals.The rest of this paper is organised as follows: Section 2 introduces the simulation driving experiment and data acquisition.Section 3 describes and explains the component of the WPCA.Experimental results and discussions are explained in Section 4. Finally, the summary is presented in Section 5.

II. SIMULATION DRIVING EXPERIMENT AND DATA ACQUISITION
In the simulated driving experiment, 8 right-handed college students aged 19 to 26 (4 males and 4 females; mean age: 22.73) volunteered to participate in the experiment, and no one has mental illness.In the simulation of EEG, eight subjects are able to reveal the effectiveness of the proposed method [7], [38], [39].Two days before the experiment, subjects were asked to avoid ingesting any anti-fatigue related products.Further, they were also required to maintain reasonable rest and sleep of more than 7 h a night.Because none of the subjects had access to the driving simulator, they had to practice driving before the experiment until they became proficient.
We conducted the experiment at the Intelligent Systems Laboratory in the Tianjin University Complex Network.We used the PGFD001 driving simulator, which was equipped with a pedal, steering wheel, and clutch.In the virtual driving software 3DInstructor2, we used the ordinary car (Phaeton2.0L),which incorporated automatic shifting by default.In addition, a webcam 360D618, projector, and stereo speakers were added to enhance perception.The experimental setup is shown in Fig. 2.
Full-scalp EEG signals were collected in an isolated and quiet room.In addition, we also monitored the subjects' facial status through a front-facing camera to verify fatigue.Before the experiment, it took 10 min to set up the scene, and 20 min to practice driving.After the experiment started, the subjects continued driving until they were reported to have mild fatigue, which was usually after 30 minutes, prior to this the drivers were considered to be in an alert state.
Following 10 min of continuous driving (as a transition), the subjects were subjected to another 30 min driving in a fatigued state.The recording time for each subject was close to 90 minutes, which varied slightly due to individual differences.
To record the drivers' fatigued state, the EEG recording device is equipped with an ESI neuroscan system with 40 electrodes, which are arranged according to the International 10-20 system, with a sampling frequency of 200 Hz.Prior to acquisition, the skin impedance of the EEG electrodes was adjusted to below 5 k by injecting conductive gel.During the experiment, all subjects were required to minimise unnecessary body movements and maintain a constant driving speed to avoid collisions.Among the 40 electrodes, in addition to the 4 electrodes as the internal structure, 2 are defined as reference electrodes and 4 (placed across the horizontal and vertical directions) are used to monitor eye movement.The original EEG signals were subjected to interference based on high frequency and low frequency noise based on eye electricity; therefore, they needed to be pre-processed by the EEGLAB toolbox.We obtained 30-channel EEG signals after eliminating the interference of noise and electro-optical artefacts.
From the collected data we selected the first 10 min of the alert state as the non-fatigue data, and the last 10 min of the fatigue state as the fatigue data.Data were split by a sliding window with a fixed length of 1 s and no overlap.After the data segmentation was completed, we had acquired 1200 sets of samples from each subject.

III. WEIGHTED PRINCIPAL COMPONENT ANALYSIS
To satisfy the requirement of real-time data acquisition, we need to reduce the dimension of the EEG signals.
The commonly used dimension reduction method is PCA, which can analyse the main influencing factors from multiple contexts, reveal the essence of entities, and simplify complex problems.The purpose of calculating the principal component is to transform the data into a new coordinate system, and project the high-dimensional data into a lower dimensional space.The main idea of PCA is to find a linear combination that can account for the largest change in the value of the initial variable, which also means finding a linear combination with the largest variance.PCA is based on data variance reduction, and treats each dimension feature equally.However, different features play different roles in the recognition process.It is conceivable to strengthen some key features of recognition, while weakening certain non-critical features (such as little correlation information), to improve the recognition accuracy.Based on this, we propose a WPCA method.The WPCA algorithm flow chart is shown in Fig. 3.
AR model has been widely introduced into EEG researches [40]- [44].The advantage of AR model lies in its inherent ability to simulate the peak spectrum of EEG signals.It is an all-pole model that can effectively solve the problem of sharp changes in the spectrum, and requires the selection of the model order number.If the AR model order is too low, the signal cannot be captured successfully.However, if it is too high, more noise is captured.
The principle of AR model is as follows: where x(t) represents the EEG data at time t, p represents the AR order number, e(t) is the white noise sequence, and a(k) represents the AR model coefficients.
We tested the 3 rd , 4 th ,. ..,7 th , and 8 th order AR models.The experimental results showed that the 3 rd , 4 th , and 5 th order AR models were optimum.Hence the 3 rd , 4 th , and 5 th order AR models were used as feature extractors.The size of the AR features is equal to the AR order number multiplied by 30 EEG channel units.Therefore, the 3 rd , 4 th , and 5 th order AR models obtained 90, 120, and 150 feature units, respectively.
To improve recognition accuracy, reduce the training time of the recognition model, and increase learning speed, we need to focus on dimension reduction of the high-dimensional feature parameters, by selecting some principal elements with high contribution rates to constitute the indicators of fatigue recognition EEG signals [46], [47].
The dimension reduction algorithm is described as follows: 1) Calculate the weights: Assuming the proposed feature is n-dimensional, SVM is introduced to train the n-dimensional data, and obtain the classification accuracy A. The first dimensional data is removed, and the remaining n − 1 dimensional data is trained by SVM.Hence, we can obtain the classification accuracy A 1 .Then, the second dimensional data is removed, and we can obtain the classification accuracy A 2 by SVM, and so on.Accordingly, the accuracy A 1 , A 2 , . . ., A n of the n classifications can be obtained.If each accuracy is different from accuracy A, then n differences (D 1 , D 2 , . . ., D n ) can be obtained.
If the difference is positive, this indicates that the dimension feature has a positive influence on the classification, otherwise, there is a negative impact.The n differences are normalised as weights for each dimension feature.
Normalisation function: where D i is the data before normalisation, D max is the maximum value of the sample, D min is the minimum value of the sample, and w i is the data after normalisation.
2) For the weights w i (i = 1, 2, n), we construct a weight diagonal matrix W n * n : 3) Write the extracted data sample set as an m * n dimensional matrix: (5) 4) Introduce a weight diagonal matrix W n * n to construct weighted new data: 5) Calculate the covariance matrix C of the input data matrix Z m * n : 6) Decomposition of the covariance matrix: The eigenvalues λ 1 , λ 2 , . . ., λ n of the covariance matrix (arranged in descending order) and the corresponding unitized eigenvectors u 1 , u 2 , . . ., u n are calculated.

7) Select the principal component according to the cumulative contribution rate and construct the mapping matrix P :
When taking the first n principal components, the cumulative contribution rate of the i th (i ≤ n) principal components is calculated as follows: According to the cumulative contribution rate, the first k feature vectors are selected and combined to form a mapping matrix P : 8) Use the mapping matrix to obtain the reconstructed EEG data features:

IV. EXPERIMENTAL RESULTS AND ANALYSIS
SVM was adopted as a classifier to establish a fatigue driving classification experiment.SVM is an efficient supervised two-category classifier, which plays an important role in data classification for small samples, nonlinear, and high dimensional modes.SVM has been widely utilized in EEG classification researches [48]- [53].We established three sets of experimental methods: (1) SVM; (2) PCA-SVM (with a cumulative contribution rate of 0.95); and (3) WPCA-SVM (with a cumulative contribution rate of 0.95).
To verify the reliability of the experimental results, this paper used a 10-fold cross-validation method.Here, the dataset is divided into 10 parts; 1 part as the test set, and the remaining 9 as the training set.The cross-validation is repeated 10 times so that each copy can be tested once as a test set, and the average of the 10 test data is taken as the result.
To assess the classification results, the following three indicators are used [4]: Here, TP is the number of positive samples that are correctly identified (the number of samples correctly recognised as fatigue driving); TN is the number of negative samples that are correctly identified (the number of samples correctly recognised as normal driving); FN is the number of positive samples that are not recognised (the number of samples for fatigue driving as normal driving); and FP is the number of negative samples that are not recognised (the number of samples for normal driving as fatigue driving).The accuracy rate reflects the proportion of the samples with the correct classification to the overall samples.The sensitivity reflects the classification accuracy of the positive samples, and the specificity reflects the classification accuracy of the negative samples.
For the EEG data of eight subjects, the experimental results are shown in Fig. 4.
Based on the experimental results of eight subjects, the following was established: Comparing the five feature extraction methods, the 4 th order AR model achieved the best classification results.When using the WPCA-SVM, all three indicators reached the highest value.Individual differences had a significant impact on the experiment.For example, the accuracy of subject 2 was clearly lower than the other seven subjects.We averaged the test results for eight subjects, as shown in Table 1.
Comparing the three experimental methods, the three performance indicators of WPCA-SVM were better than both PCA-SVM and single SVM without dimension reduction.
Comparing the five feature extraction methods: i) Extract features by the 3 rd order AR model: When using WPCA-SVM, the accuracy, sensitivity, and specificity increased by 5.71%, 5.85%, and 6.12%, respectively, compared with the classification results using SVM alone; the accuracy, sensitivity, and specificity increased by 3.50%, 3.88%, and 3.14%, respectively, compared with the classification results of PCA-SVM.ii) Extract features by the 4 th order AR model: When using WPCA-SVM, the accuracy, sensitivity, and specificity increased by 4.39%, 4.33%, and 4.51%, respectively, compared with the classification results using SVM alone; the accuracy, sensitivity, and specificity increased by 3.11%, 2.97%, and 3.28%, respectively, compared with the classification results of PCA-SVM.iii) Extract features by the 5 th order AR model: When using WPCA-SVM, the accuracy, sensitivity, and specificity increased by 6.78%, 7.24%, and 6.41%, respectively, compared with the classification results using SVM alone; the accuracy, sensitivity, and specificity increased by 5.28%, 5.57%, and 4.98%, respectively, compared with the classification results of PCA-SVM.iv) Extract features by PSD: When using WPCA-SVM, the accuracy, sensitivity, and specificity increased by 6.75%, 7.06%, and 6.56%, respectively, compared with the classification results using SVM alone; the accuracy, sensitivity, and specificity increased by 5.12%, 5.05%, and 5.20%, respectively, compared with the classification results of PCA-SVM.v) Extract features by DE: When using WPCA-SVM, the accuracy, sensitivity, and specificity increased by 5.82%, 6.01%, and 5.67%, respectively, compared with the classification results using SVM alone; the accuracy, sensitivity, and specificity increased by 3.99%, 4.06%, and 3.89%, respectively, compared with the classification results of PCA-SVM.We could find that the classification results were optimum when the 4 th order AR model was used.When using the SVM classifier alone, all three indicators were greater than 93%; when using PCA-SVM, all three indicators were greater than 95%.Therefore, when using WPCA-SVM, the three indicators achieved the highest, but only by a small margin.
We compared the number of features and data volume before and after dimension reduction, where the amount of data was calculated through theamountof data = samples × channels × features.(15) We averaged the results for 8 subjects, as shown in Table 2.It was found that by using PCA for dimension reduction, the feature dimension significantly declined.Using WPCA, the feature dimension (and the amount of data) decreased more obviously, and this made the proposed WPCA an effective solution for the EEG big data processing problem.
The following overall observations can be attained: 1) Comparison of the three experimental methods showed that when the SVM classifier was used alone, the data feature dimension was higher, which made the classification task more difficult.When using PCA to reduce the dimension, each dimension feature was treated equally, but the different features played different roles in the identification process.When using WPCA to reduce the dimension, the weights were determined in relation to the importance of different features in the recognition.This enhanced certain features that identified key features and attenuated redundant feature information.Therefore, the experimental results of WPCA-SVM were superior to SVM and PCA-SVM, and all indicators were improved.
2) The classification results obtained by extracting features with the 4 th order AR model were the best, and the three indicators could reach the highest value compared to the other five feature extraction methods.This was because the 4 th order AR model could track EEG signals accurately, and did not capture too much noise.When the 3 rd AR model was used as a feature extractor, because only three features were extracted per channel, the EEG signals could not be well characterised.However, more noise was captured due to the higher order number when the 5 th AR model was used as a feature extractor.When using PSD or DE to extract features, the performance indicators were lower than those of the 4 th order AR model.Thus, it can be illustrated that better classification results can be obtained when using the 4 th order AR model.3) Due to individual physiological differences, the accuracy of the results of Subject 2 was slightly lower than for the other subjects.The experimental results showed that the 4 th order AR model could extract the features of EEG signals excellently, which was the basis for accurate classification in real time.Compared with SVM and PCA-SVM, the proposed WPCA-SVM had a higher accuracy of fatigue detection, although the extent of improvement in accuracy was affected by individual differences.There differences in the accuracy of subjects' fatigue state detection, but on the whole, the accuracy was improved after using WPCA-SVM.Therefore, these individual differences hardly affected the effectiveness of the proposed WPCA method.4) The high frequency EEG signals in a short time interval contained a large amount of raw sample data, which led to a big data analysis problem.Large-scale data presented challenges for real-time analysis and storage.Accordingly, we proposed a WPCA algorithm to reduce the dimension of the original data, according to the impact of different attributes on the classification results.The algorithm reduced the feature dimension, and improved the accuracy of the classification, which facilitated big data processing.

V. SUMMARY
In this paper, an EEG-based IoV traffic management system has been proposed to improve traffic safety.We proposed a feature reduction method, based on a novel WPCA algorithm, to reduce the dimension of EEG signals for the purpose of real-time requirement.To verify the algorithm, we carried out a simulated driving experiment involving eight subjects.For comparison, the 3 rd , 4 th , and 5 th order AR models, PSD, and DE were used as feature extractors.We determined the weights according to the importance of different features in the recognition, and SVM was selected as the classifier.The accuracy, sensitivity, and specificity were introduced as the classification evaluation indicators.At the same time, three experimental methods (SVM, PCA-SVM, and WPCA-SVM) were designed for each feature extractor.The experimental results indicated that when using the 4 th order AR model to extract features, the classification results of the 8 subjects could attain the highest accuracy.Compared to SVM and PCA-SVM, the proposed WPCA-SVM achieved the best performance with various feature extraction methods.Overall, the algorithm greatly reduced the amount of data and improved the accuracy of the classification, which was more suitable for big data processing.
Cheng et al. proposed an EEG-based prediction system that transforms the measured EEG record into image-like data for estimating the drowsiness level of drivers [16].Gao et al. proposed a novel relative wavelet entropy complex network for improving EEG-based fatigue driving classification [17].Foong et al. proposed an iterative negative-unlabeled learning algorithm for detecting cross-subject of passive fatigue from labelled alert and unlabeled driving EEG data [18].Luo et al. proposed an adaptive multi-scale entropy feature extraction algorithm for fatigue driving detection [19].Han et al. introduced complex network theory to study the evolution of brain dynamics under different rhythms of EEG signals during several periods of the simulated driving
The autoregressive (AR) model is introduced to extract the features of the EEG signals recorded by drivers during a simulated driving experiment.We calculate the influence of different features on the fatigue state classification performance.Then, the accuracy reduction values of different features are normalised as the weights of the features.SVM is selected as the classifier, and accuracy, sensitivity, and specificity are used as classification evaluation indicators.Simultaneously, three sets of control experiments (SVM, PCA-SVM, and WPCA-SVM) are designed.

FIGURE 2 .
FIGURE 2. Experimental setup: (a) The experimental scene from the perspective of the researchers.(b) Brain caps of the neuroscan system.

TABLE 1 .
Average results for eight subjects.

TABLE 2 .
Comparison of features and the amount of data.