Diverse Feature Blend Based on Filter-Bank Common Spatial Pattern and Brain Functional Connectivity for Multiple Motor Imagery Detection

Motor imagery (MI) based brain-computer interface (BCI) is a research hotspot and has attracted lots of attention. Within this research topic, multiple MI classification is a challenge due to the difficulties caused by time-varying spatial features across different individuals. To deal with this challenge, we tried to fuse brain functional connectivity (BFC) and one-versus-the-rest filter-bank common spatial pattern (OVR-FBCSP) to improve the robustness of classification. The BFC features were extracted by phase locking value (PLV), representing the brain inter-regional interactions relevant to the MI, whilst the OVR-FBCSP is used to extract the spatial-frequency features related to the MI. These diverse features were then fed into a multi-kernel relevance vector machine (MK-RVM). The dataset with three motor imagery tasks (left hand MI, right hand MI, and feet MI) was used to assess the proposed method. Experimental results not only showed that the cascade structure of diverse feature fusion and MK-RVM achieved satisfactory classification performance (average accuracy: 83.81%, average kappa: 0.76), but also demonstrated that BFC plays a supplementary role in the MI classification. Moreover, the proposed method has a potential to be integrated into multiple MI online detection owing to the advantage of strong time-efficiency of RVM.

classified depending on the frequency of the signal. In particular, the alpha activity (8)(9)(10)(11)(12) and the beta activity (12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30) are mostly related to MI [3]. In addition, the increase or decrease in the activity at a certain frequency band locked to the onset of the MI refer to the event-related synchronization and desynchronization (ERS/ERD), respectively [4]. And thus, ERS and ERD are always used for characterizing MI [4]. Typically, the imaginary movements are four categories, which are left hand movement, right hand movement, movement of the feet, and movement of the tongue. More than two categories included in the classification forms multi-class classification. To extract effective features is quite important for the MI classification.
The feature extraction of MI-based EEG is a process that extracts discriminative information from filtered EEG signals. And such extracted information directly influences the classification accuracy of a classifier for MI. Typically, the basic feature extraction methods include the time-domain approaches and frequency-domain techniques. For example, autoregression and quaternions are typical time-domain methods and Fourier transform is based on frequencydomain [5]- [8]. However, independent use of time-domain or frequency-domain method may cause absence of features due to the ignorance of the spectral information or temporal features. Therefore, time-frequency domain techniques seem to satisfy both sides which combines spectral and temporal information together [9]. For example, Zhou et al. used a cascade structure in which the discrete wavelet transforms (DWT) decomposed EEG signals whereas the Hilbert transform (HT) could transform the decomposed one to wavelet envelop [10]. In addition to timefrequency domain techniques for feature extraction, the spatial domain analysis is also widely used for the classification of MI-based EEG signals [11]. Specifically, common spatial pattern (CSP) as well as its variants such as the common spatio-spectral pattern (CSSP) [12], the common sparse spatio-spectral patterns (CSSSP) and sub-band common spatial pattern (SBCSP) have attracted many interests [13], [14]. The variants of CSP aim to speed up the computational efficiency and improve classification accuracy. In this study, we propose a one versus the rest filter-bank common spatial pattern (OVR-FBCSP) method for the feature extraction of MI-based EEG.
The feature extraction can be considered in terms of the different domains of the signal. However, the information between different nodes of the electrode can show the property of neuron populations and thus neural connectivity should be paid attention during feature extraction. Recently, through the analysis of neural connectivity in the brain, the general function and communication between different regions of the brain are described. For example, Liang et al. [15] and Lee et al. [16] proved the functional connectivity in the process of motion imagination planning. Gong et al. proposed a brain network modeling method based on time-frequency cross mutual information of four classes of MI. Through statistical analysis and topological feature analysis, they observed significant differences in response level, response time, and activation target of four classes of MI tasks [17]. Moreover, Xu et al. proposed to use phase synchronization information to extract features to classify more than one category of MI of the same limb [18]. More remarkable is that Li et al. combined ERD/ERS analysis with dynamic networks in different MI stages to explore the dynamic processing of MI information. The results showed that the specific dynamic network structure conformed to the ERD/ERS evolution model [19].
Apart from the feature extraction method influences the accuracy and running speed of MI-based EEG classification, the classifier is another important issue. Typical classifier includes support vector machine (SVM), linear discriminant analysis (LDA), logistic regression, and artificial neural network (ANN) [8], [20], [21]. To further improve the classification accuracy of MI-based EEG signal, deep learning approach and recurrent neural network have been widely paid attention to [22], [23]. For example, Cheng et al. used deep neural network (DNN) as the classifier for the exploration of MI-based EEG pattern in stroke patients [24]. And DNN method (74.9%) gained a higher classification accuracy than that with SVM (67.7%). To achieve a trade-off between computational efficiency and classification accuracy, the multi-kernel method has attracted many interests [25]. In MI-based EEG classification, due to the variability of the EEG signal, single kernel function cannot be suitable for all imaginary movements. Here, we use a multi-kernel relevant vector machine (MK-RVM) to classify features of selected band subsets.
In this paper, we propose a cascade structure of one-versusthe-rest filter-bank common spatial pattern (OVR-FBCSP) method and MK-RVM for the classification of three imagery movements (left-hand movements, right-hand movements, and feet movements). This dataset was shared by the intelligent information processing and human-computer interaction laboratory at the Anhui University. The arrangement of this work is organized as follows. Section 2 shows the method of our study, which includes the experiment and data acquisition, signal preprocessing, feature extraction, feature selection, and classifier. Section 3 provides results of the method assessment. Discussions are presented in section 4. Finally, we give a conclusion in section 5.

II. METHOD
In this section, we focus on data acquisition and preprocessing, feature extraction, feature selection, and the classifier.

A. DATA ACQUISITION AND PREPROCESSING
The data were collected at the intelligent information processing and human-computer interaction laboratory of Anhui University. Six healthy subjects ranged from 22 to 28 years old participated in the MI experiment. During the experiment, subjects sit in front of a computer screen. The duration of each trial was 10 s and the onset of each trial was hinted with a ''beep'' tone. VOLUME 8, 2020 Then the screen displays the cue arrows with the different directions which promote the subject to perform imagery movements (left-hand movements, right-hand movements, and feet movements). To prevent the subject from pre-imagining and to obtain stable EEG signals, the appearance of the arrows during the experiment was random with a duration of 6 s. After that, the computer displays a black screen and thus the subject can relax and wait for the next trial experiment. The data segments between 0.5-2.5 s (0.5-2.7 s for Subject 3, 0.5-2.6 s for Subject 5) relevant to the onset of visual cues presented to participants are used in this study. Each subject underwent 6 sessions, each of which comprises 75 trials (The numbers of trials for each category are not identical due to randomization). During the experiment, EEG signals were collected from the scalp using a headset with 14 electrodes (Fp1, Fp2, FC3, FCz, FC4, C3, Cz, C4, CP3, CPz, CP4, O1, Oz, and O2). Nine of them were used in this study (see Fig. 1). The sampling frequency was 250 Hz, and the data were sampled with a notch filter of 50 Hz and a band-pass filter of 0.5-100 Hz [26]. The details can be found in [27]. Then, we used the EEGLAB toolbox to do the data preprocessing, which included baseline subtraction, common average reference (CAR), and band-pass filtering. As the amplitude and spectrum power in µrhythm (8-12 Hz) and βrhythm (14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30)

B. FEATURE EXTRACTION
The feature extraction was performed by a combination of the FBCSP method and the brain functional connectivity (BFC) method.

1) FILTER-BANK COMMON SPATIAL PATTERN
The projection matrix C of CSP is constructed for filtering band data. In this paper, the first three pairs and the last three pairs of eigenvalues in the projection matrix are selected, and a total of six independent CSP filters are used for spatial filtering. Then we define the triple classes of MI EEG signals as X 1 , X 2 , and X 3 , respectively. Then the covariance model of multiple-class MI space is shown as following [28]: The dimension of X i is the multiplication of the number of channels, the time window, and the sampling frequency. R i is the spatial covariance of EEG signals for each MI class. Then the composite covariance matrix can be denoted as: The singular value decomposition of covariance matrix R can be carried out as: where U 0 is the unitary matrix of principal components, and C is the diagonal matrix of eigenvalues. After calculating singular value decomposition of eigenvector and eigenvalue matrix, the transformation matrix of covariance matrix can be obtained: In this paper, we need to calculate three OVR-CSP patterns. For example, the left-hand MI CSP versus the rest CSP can be calculated as:R Then we transform R 1 andR 1 into: Furthermore, we do eigenvalue decomposition for E 1 andẼ 1 : By combining the equation (1)- (7), we can get: where U 1 is the common eigenvector matrix. Then we can get the CSP projection matrix C 1 = U T 1 Q and the selected features of left-hand MI can be obtained as following [29]: where n is the number of samples in the class i. Six frequency bands for each trial need 6 × 6 CSP filters for feature extraction. And six eigenvalues extracted from each frequency band are merged to obtain 36 eigenvalues. Then we carried out such feature extraction with normalization for each frequency band and thus the normalized feature can be obtained as: Furthermore, three classes of OVR-FBCSP features can be obtained as: F fbcsp of each trail is used to form 108 eigenvalues.

2) FUNCTIONAL BRAIN NETWORK
In this study, we use the phase locking value (PLV) to calculate brain connectivity as it has good performance in functional brain network analysis [18], [30]. The expression of PLV is as follows: where N is the total number of sample points in a trial, σ 1 (t, k) − σ 2 (t, k), k = 1, 2, . . . , N represents the phase difference of a pair of electrodes at time t. Due to the binary network will lose a lot of information in the process of binarization, we choose the weighted network as the feature of this study. PLV value is between [0, 1], in which 0 is no connection between channels whereas 1 is the perfect connection between channels [18]. We use four measures for the evaluation of PLV which are degree, clustering coefficient, average path length, and local efficiency. These measurement methods are obtained by graph theory analysis through the brain connection toolbox [31].
(1) Degree: The degree of a node represents the number of edges of a node, which can be calculated with the following formula: where G is the set of all nodes in the network and w ij is an element of the weighted network matrix. In the weighted network, the higher degree of a node, the more important it is.
(2) Clustering coefficient: The clustering coefficient indicates the clustering degree of brain function network nodes, which can be calculated with the following formula: where R represents the number of neighboring nodes that directly connected to the node i.
(3) Average shortest path length: The average shortest path length reflects the information transferability within the brain function network, which can be expressed as: where L ij is the shortest path length between node i and j, M is the number of nodes.
(4) Local efficiency: Local efficiency is used to measure the ability of local information transmission and processing which can be expressed as: where M is the number of nodes. E global (G i ) is the global efficiency of node i.
The measures of the PLV are extracted as the features of functional brain network.

3) FEATURE FUSION
The feature dimension of clustering coefficient and local efficiency extracted in each frequency band of each trail are Channels × Trials, while the feature dimension of average node degree and average shortest path length extracted in each frequency band of each test are 1× Trials.
To check the effect of average shortest path length, we set up two combined schemes (PLV1: clustering coefficient, local efficiency, and average node degree; PLV2: clustering coefficient, local efficiency, average node degree, and average shortest path length) for the feature fusion with FBCSP [31]. Finally, FBCSP features and PLV complex network measure features were combined:

C. FEATURE SELECTION
In this study, we use the F-score feature selection method as it is a simple but effective method for the selection of the discriminative power of each feature in a feature set [33]. The dataset is divided into positive and negative classes. For example, the left hand is defined as positive class n + whereas the right hand and feet are defined as negative class n − . We use F-score to sort the features and select the top n feature sets in the ranking list. And the value of n varies from 1 to half of the total [34]. Given the train dataset x k , k = 1, 2, . . . , m. Then, the F-score of the f th feature of the dataset is defined as [35]: are the mean value of the f th feature on the whole training dataset, the mean value on the positive dataset, and the mean value on the negative dataset, respectively.x

D. CLASSIFIER
In this paper, we use MK-RVM to realize multi-class MI recognition. The log marginal likelihood (A) = log P(Y|K, A) = log P(Y|K, W)P(W |A )dW which can further be expressed as [36]: where C = I + K T A −1 K , K is the kernel. A is the scales matrix. y c is the c th column of regressor target Y ∈ C×N . W ∈ C×N is the regressor that expresses weight for a specific class. And the log marginal likelihood can be also decomposed into: where s i is the sparsity factor and q ci is the new multi-class quality factor. The sparse factor is the measure of overlap between a sample k i and the ones already included in the model. The quality factor q ci measures the quality of a sample that describes a specific class. α i is a hyperparameter. We can get stationary points with the derivation ∂ (A)/∂α i = 0 and α i can be expressed as: So A can be updated with Eq. (22) whereas the regressor W can update as: In addition, Y can be updated as: whereỹ cn is a standardized noise model, u ∼ N (0, 1) and the Gaussian cumulative distribution function. k n is each row of kernel K.
In this paper, we use the 5-fold cross-validation to do the classification [6]. The whole procedure can be described as follows: Step 1 Preprocessed each session was randomly divided into five sets, of which four sets are training samples (80%) and the remaining set is testing samples (20%).
Step 2 FBCSP and PLV measures were calculated, which was followed with feature fusion.
Step 3 Fused features were selected by the F-score feature selection method.
Step 4 The selected features were fed into the MK-RVM classifier.

E. PERFORMANCE MEASURE
Some measurement performances are used in this paper. The classification accuracy can be expressed by the following formula [34], [37]: where TP is true positive, TN is true negative, FP is false positive, FN is false negative. (TN + TP) is the number of correctly classified samples. N total represents the total number of test samples. Precision and recall are two measures widely used in the field of statistical classification, which are used to evaluate the classification results [37].
As a performance indicator of BCI, kappa is often used to measure multiple classes of problems. It is considered more robust than the overall agreement (accuracy) because it needs to take into account the chances of the agreement occurring [34]. We can be expressed as [38]: where p 0 is the overall agreement of the test samples, which is equal to the accuracy. p e is the chance agreement probability value of the test samples, which can be obtained by the following formula: where a i and b i represent the sum of i th class real samples and i th class predicted samples of the confusion matrix, respectively. N total is the total number of test samples. The receiver operating characteristic (ROC) is an index to evaluate the performance of classifiers. In ROC space, the abscissa of each point is the false positive rate (FPR) and the ordinate is the true positive rate (TPR), which describes the trade-off between TP and FP.
where FPR represents the probability that a negative case is misclassified into a positive one, TPR represents the probability of pairing positive examples.

III. RESULTS
In the performance assessment, the data of each subject were divided into five equal sets randomly, of which four sets were for training and the remaining set was for testing.  Each set was used as the testing samples one time to perform the 5-fold cross-validation. To demonstrate the classification performance, we compared the proposed method (FBCSP+PLV2) to the other four methods (PLV1, PLV2, FBCSP, FBCSP+PLV1, FBCSP+PLV2) and their accuracies are shown in Fig. 2. The best classification accuracy was found in Subject 4 and the classification accuracy of the proposed method is 94.89%±2.91% (Mean±SEM). Kappa as a measure multi-class classification performance [34] and the result is shown in Fig. 3. The kappa value for Subject 4 with the proposed method can reach 0.923±0.046. To intuitively show the brain activity during MI, we draw topographical maps of OVR-FBCSP features extracted from EEG signals of subject 1. As mentioned in the section of feature extraction, selected features are transformed with corresponding projection matrix C in which the first three pairs of eigenvalues represent the selected features and the last three pairs are fused with the rest features. As can be seen from Fig. 4, when the subject does the leftward/rightward MI, the contralateral hemisphere is highly active. In addition, when the subject does the MI of feet, the central area of the selected nodes is active whereas the peripheral of the nine nodes is inactive with the combination of other two MI protocols (left-hand and right-hand movements). The connectivity matrix for each MI trial was constructed based on the EEG segment firstly, then they were divided to three groups according to the label. Fig. 5 shows mean connectivity matrices of left hand, right hand, and feet MI for Subject 1. The average connectivity matrix of left hand, right hand and feet MI is similar, and the diagonals of the three connectivity matrices are relatively large, which proves that the strong synchronization mainly occurs in temporal and parietal channels.   However, the upper right and lower left corners are dark blue, which indicates that the synchronization between frontal and occipital channels is weak. Fig. 6 shows the confusion matrix of the sum of 5-fold cross-validation of 6 sessions for each subject under the FBCSP+PLV2 method. Fig. 7 shows the confusion matrix of the sum of 5-fold cross-validation of 6 sessions for each subject and the ROC curve of each subject and the average curve of three categories for each subject under the FBCSP+PLV2 method. We can see that this method has good classification performance and robustness. We calculate the precision and recall rate according to the confusion matrix under the FBCSP+PLV2 method, as shown in Table 1. Table 2 compares the FBCSP+PLV2 method with the other four methods and uses a t-test to calculate the p-value.
Furthermore, we also compared the time consumption among the adopted five methods ( Table 3). The time cost is the averaged time of 5-fold cross-validation of the testing  time for six subjects. The method combined by FBCSP and PLV2 has more features than the other four methods and thus takes a long-time cost. However, compared with SVM as the baseline, RVM showed stronger timely effectiveness.  The multiple MI ROC obtained from motor imagery EEG for Subject 1 to Subject 6 under the FBCSP+PLV2 method, where Class1 is the left hand category, class2 is the right hand category, and class3 is the feet category.

IV. DISCUSSION
In this paper, we focus on classify three classes of MI data derived from the EEG signal. First, the collected data are pre-processed with the selection of the time window and frequency band. Then we extract features of pre-processed data, using the fusion of FBCSP and PLV complex network measure. Finally, we feed selected features into MK-RVM to classify the multiclass of MI. Here we aim to discuss the proposed classification structure in terms of time window optimization, feature fusion, classifiers, MI experiment, Comparison of different training protocols of MI tasks, and the limitation and prospect of the experiment.

A. TIME WINDOW SELECTION
The selection of the time window can remove the segmentation of EEG signals that have nothing to do with MI or eliminate data that is not the key time point of MI [27]. Our previous study has also shown that choosing the optimal time window for each subject can indeed improve the classification accuracy [39]. Although the optimal time windows of each subject's MI are different, it is found that the time between 0-2.5 s after the onset of visual cue can benefit the classification considering adequate sample points for subsequent data processing [40]. In this study, data between 0.5-2.5s (Subject 3 takes 0.5-2.7 s, Subject 5 takes 0.5-2.6 s) after the onset of the visual cue direction were used.

B. FEATURE FUSION STRATEGY
In this paper, by fusing the features of FBCSP and PLV complex network measure, we have achieved a good classification performance of multiple classes of MI. Feature fusion technique has also been applied in some other EEG-analysing related tasks. Ai et al. proposed a feature extraction method that combined the features of brain function network and local characteristic-scale decomposition (LCD) together. The good performance of this method was verified on the self-designed real-time BCI robot control and has put forward four classes of dataset [41]. In addition, to measure the complexity of EEG time series, Wang et al. proposed a fusion entropy (sample entropy, approximate entropy, and spectral entropy) analysis method for EEG and EOG signals. Results showed that the average accuracy of the fusion entropy analysis method combined with EOG and EEG can reach 99.1±1.2% [42]. Furthermore, Zhu et al. discussed the performance of multi-user MI-BCI idle detection based on common spatial pattern (CSP) and brain network features and proposed several cross-training feature fusion strategies [30]. The advantage of feature fusion was also found in other classification studies [43]- [45]. Based on the above studies, we can see that feature fusion outperforms single feature in the classification.

C. CLASSIFIER
The classifier selection plays a key role in the recognition of multiple MI tasks. MK-RVM has shown good performance in many aspects of classification tasks [36], [46], [47]. However, most of the studies on multi-class MI used multiple binary RVM. For example, Dong et al. added chaos dynamics into the kernel function of the RVM classifier in the framework of one versus one common spatial pattern (OVO-CSP) and thus made it excel in multi-class MI tasks [48]. Zhang et al. combined the location of EEG dipoles with CSP to extract features from multi-class MI and extracted features were fed as the input to RVM [49]. Furthermore, Dong et al. proposed a new hybrid kernel RVM in which fused Gaussian kernel and polynomial kernel together. In MI tasks, by using the OVO-CSP strategy, phase space CSP (PSCSP) features were extracted and fed into the RVM classifier [50]. However, no relevant MK-RVM research has been found in multiple classes of MI tasks based on EEG signals and thus we use the MK-RVM classifier to classify three classes of MI tasks, and the best average accuracy can reach 83.81% (kappa: 0.76).

D. COMPARISON OF DIFFERENT TRAINING PROTOCOL OF MI TASKS
Here we summarize the related research of MI in recent years (Table 4). It can be seen from the Table 4 that the research on multiple classes of MI has gradually increased, but most of the research is still focused on offline data analysis, and there are few online analysis experiments. Compared with traditional classifiers and deep learning methods, the proposed method can achieve a medium to high classification accuracy.
We use FBCSP and PLV feature fusion, and then use MK-RVM for multiple MI recognition. Compared with the other latest methods (see Table 4), the method based on the feature fusion of FBCSP and PLV achieved better performance. This is due to the diverse features representing different MI-related information and help in the classification of MI tasks. The results demonstrated that the combination of FBCSP and PLV can extract discriminative features of both spatial-frequency and brain inter-regional interactions relevant to MI tasks. These complementary features could also benefit the understanding of underlying neural mechanisms of MI.

E. THE LIMITATION AND PROSPECT OF THE EXPERIMENT
There are some limitations to this study. First, our study performed the classification on three MI categories and the sample size for each category was not large. It would be better to evaluate the proposed method with more categories and more samples. Second, the proposed method is still an offline one and the processing of fused features takes a relatively long time. Moreover, the analysis has shown that in certain subjects, the selected channel number and frequency band have different optimal choices for different subjects [54]- [57]. Therefore, we will conduct more experiments on channel number and frequency bands in different subjects in future studies to verify this hypothesis.

V. CONCLUSION
In this paper, we proposed a cascade structure of feature fusion and MK-RVM for the classification of three-class MI tasks. The feature fusion integrates the OVR-FBCSP with PLV. The average classification accuracy reached 83.81%. The proposed method has a potential to be applied in real-time MI-based BCI applications. He has published more than 30 articles as the first and second author and participated in the compilation of two works. His current research interest includes solving the common, frequently occurring, and complicated clinical problems in neurosurgery.
ANASTASIOS BEZERIANOS (Senior Member, IEEE) received the degree in physics from the University of Patras, Patras, Greece, the degree in telecommunications from Athens University, and the Ph.D. degree in bioengineering from the University of Patras. He is currently a Professor with the N.1 Institute for Health, National University of Singapore, and a Visiting Professor with the Department of Computer Science, New South Wales University (NSWU), Canberra, ACT, Australia. He has been a Professor of medical physics with the Medical School, University of Patras, since 2004. His research interests include diverse areas spanning from artificial intelligence and robotics to biomedical signal processing, brain imaging, mathematical biology and systems medicine, and bioinformatics. His work was summarized in 140 journal articles and 217 conference proceedings publications and one book, and he holds two patents. He has research collaborations with research institutes and universities in Japan, China, and Europe. He is also a Fellow of the European Alliance for Medical and Biological Engineering and Science (EAMBES), and the Founder and the Chairman of the Biannual International Summer School on Emerging Technologies in Biomedicine. He is also an Associate Editor of the IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING (TNSRE), the PLOS One, and Neuroscience journal, and a Reviewer for several international scientific journals. He is also a Registered Expert of the Horizon 2020 Program of the European Union and a Reviewer of research grant proposals in Greece, Italy, Cyprus, and Canada.
JUNHUA LI (Senior Member, IEEE) received the Ph.D. degree in computer science from the Department of Computer Science and Engineering, Shanghai Jiao Tong University, China, in 2013. He is currently a Lecturer with the School of Computer Science and Electronics Engineering, University of Essex, Colchester, U.K. He was a Senior Research Fellow of the National University of Singapore, Singapore. His research interests include computational neuroscience, brain-computer interface, machine learning, and neurophysiological data analytics and their practical applications. He is an Associate Editor of IEEE ACCESS and a Review Editor of Frontiers in Human Neuroscience, and served as a guest associate editor of several special issues related to his research interests. VOLUME 8, 2020