Learning Common Time-Frequency-Spatial Patterns for Motor Imagery Classification

The common spatial patterns (CSP) algorithm is the most popular spatial filtering method applied to extract electroencephalogram (EEG) features for motor imagery (MI) based brain-computer interface (BCI) systems. The effectiveness of the CSP algorithm depends on optimal selection of the frequency band and time window from the EEG. Many algorithms have been designed to optimize frequency band selection for CSP, while few algorithms seek to optimize the time window. This study proposes a novel framework, termed common time-frequency-spatial patterns (CTFSP), to extract sparse CSP features from multi-band filtered EEG data in multiple time windows. Specifically, the whole MI period is first segmented into multiple subseries using a sliding time window approach. Then, sparse CSP features are extracted from multiple frequency bands in each time window. Finally, multiple support vector machine (SVM) classifiers with the Radial Basis Function (RBF) kernel are trained to identify the MI tasks and the voting result of these classifiers determines the final output of the BCI. This study applies the proposed CTFSP algorithm to three public EEG datasets (BCI competition III dataset IVa, BCI competition III dataset IIIa, and BCI competition IV dataset 1) to validate its effectiveness, compared against several other state-of-the-art methods. The experimental results demonstrate that the proposed algorithm is a promising candidate for improving the performance of MI-BCI systems.


I. INTRODUCTION
B RAIN-COMPUTER interfaces (BCIs) establish a direct connection link between the brain and the external world, which is independent of peripheral nerves and muscles [1], [2]. BCIs not only provide an alternative communication channel for disabled patients [3]- [6], but also have many promising applications for healthy users such as environment control [7], [8] or fatigue detection [9], [10]. To acquire brain signals for controlling a BCI the electroencephalography (EEG) is commonly used due to its non-invasive nature, high temporal resolution, and low cost [11], [12]. However, EEG signals suffer from external noise and are prone to physiological artifacts, which poses challenges when using them for BCI control [13].
Among BCI systems, motor imagery (MI)-based BCIs can be more flexible in their applications than many other types of BCI because they can be operated asynchronously and can be more intuitive to control [14]- [17]. During MI, the rhythmic EEG activity is suppressed on the contralateral side of the brain to the limb the individual is attempting to control. This is the so-called event-related desynchronization (ERD) [18]. However, the spatial location in the brain, temporal onset, relative decrease in EEG power, and stability of the ERD are highly variable across trials, sessions, and individuals [19]. This poses a considerable challenge for designing MI-BCI systems.
To optimally extraction of EEG features that describe the ERD, researchers proposed the common spatial patterns (CSP) algorithm, which seeks spatial filters to extract the features that optimally discriminate different motor control attempts [20]. Due to its success, a large number of variants of CSP have emerged. For instance, regularization was introduced to CSP to solve the problem of overfitting [21]- [24].
The effectiveness of CSP depends on identifying the optimal EEG frequency band. However, the optimal filter frequency band is participant-specific, meaning a general solution with a fixed frequency band is not possible. Thus, several variants of CSP use different frequency bands, including methods such as sub-band CSP (SBCSP) [25], filter bank CSP (FBCSP) [26], discriminative FBCSP (DFBCSP) [27], and sliding window discriminative CSP (SWDCSP) [28]. However, in the above literature, the importance of the temporal onset of the ERD is often overlooked. The time course of the ERD following a movement cue varies over time and across participants [29]. Therefore, an optimal CSP method should account for this variability when training the spatial filter. However, only a few studies have tried to tackle this problem by employing approaches for automatic selection of the optimal time interval [30], [31].
This study proposed a novel framework, termed common time-frequency-spatial patterns (CTFSP), to learn sparse CSP features from multi-band filtered EEG data over multiple candidate time windows. The major innovations and contributions of this work can be summarized as follows: 1) We adopted a sliding time window approach to decode MI tasks. 2) In each sub time window, we extracted sparse CSP features from multiple candidate frequency bands. 3) We developed a novel classification method that makes use of a classifier fusion strategy. The final decision output is based on the results from multi classifiers to reduce the risk of misclassification. Finally, we validated the efficiency of the proposed framework using three public EEG datasets.
The remainder of this paper is organized as follows. Section 2 introduces the three BCI Competition Datasets used in this study and describes the details of our proposed framework. Section 3 presents the experimental results, which are then discussed in Section 4. Section 5 concludes this work.

A. Data Description
Dataset 1 (DS1): The first dataset we use is Dataset IVa from the BCI Competition III [32]. It was recorded from five healthy participants labelled aa, al, av, aw, and ay. Visual cues were displayed for a period of 3.5 s, during which the participants were instructed to perform one of the corresponding MI tasks: left hand, right hand, and right foot imagination. Then, the participants were instructed to relax for a period of between 1.75 to 2.25 s. Each participant was asked to complete 280 trials. The timeline of this experiment is shown in Fig. 1(a). The original dataset includes  [32]. It was recorded from 60 EEG electrodes with a sample rate of 250 Hz from three participants labelled k3, k6, and l1. The participants were instructed to rest for 2 s at the beginning of each trial. At t = 2 s, an acoustic stimulus and fixation cross were presented. At t = 3 s, an arrow pointing left, right, up, or down was displayed for 1 s and the participant was asked to imagine a left-hand, right-hand, foot, or tongue movement until the cross disappeared at t = 7 s. The numbers of trials per class are 90, 60, and 60 for participants 'k3', 'k6' and 'l1', respectively. In this study, we only select the trials from two MI classes: left-and right-hand movements for Participants 'k3' and 'l1'; right-hand and tongue movements for Participant 'k6'. Fig. 1(b) shows the timeline of the experiment.
Dataset 3 (DS3): The third dataset is Dataset 1 from the BCI Competition IV [33]. This dataset was recorded from seven participants performing MI. At the beginning of each trial, a fixation cross was first displayed at the center of the computer screen for a period of 2 s. At t = 2 s, an arrow pointing left, right, or down was displayed and the participant was asked to perform the corresponding MI task (left / right hand and foot) until the cross disappeared at t = 6 s. Then, the participant was instructed to rest for 2 s. Fig. 1(c) shows the timeline of the experiment. Each participant was asked to complete 200 trials. We do not use the data from three participants labelled c, d, and e, because they are artificially generated. The dataset includes signals from 59 EEG channels, which are down-sampled to 100 Hz. More details can be found on the following website: http://www.bbci.de/competition/iv/. Table I shows a list of the notations and definitions that are used later. A fifth-order Butterworth band-pass filter of 8-30 Hz was first used to filter out components unrelated to sensorimotor rhythms. Then, the filtered EEG signals are divided into seven frequency bands: mu band (8-13 Hz), two sub-bands of the mu band (8-10 Hz; 10-13 Hz), beta band (13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30), and three sub-bands of the beta band (13-18 Hz; 18-23 Hz; 23-30 Hz).

C. Feature Extraction
CSP is a widely used feature-extraction method in MI-based BCIs. It optimizes a set of spatial filters to maximize the variance of one class while minimizing the variance of the other class. The average spatial covariance matrix can be computed as We can find a spatial filter that maximizes the variance of one class and minimizes the other class by solving arg max The above Rayleigh quotient can be transformed into the generalized eigenvalue problem where λ and w are the generalized eigenvalue and eigenvector, respectively. The spatial filter W csp = [w 1 , . . . , w 2m ] is formed by the eigenvector corresponding to the m maximum and minimum eigenvalues.

D. Feature Selection
By applying CSP on each of the filtered signals, we derive the following feature set where x i, j denotes the j -th feature extracted from EEG in the i -th trial and D = 2m × 7 is the dimensionality of the feature set. When CSP is applied to multiple frequency bands multiple features are generated. We used feature selection to reduce the dimensionality of these features and simplify the subsequent classification model. The least absolute shrinkage and selection operator (LASSO) is a penalized least squares method imposing an L 1 -penalty on the regression coefficients [34], [35]. The LASSO estimates are defined as arg min where y i is the class label for trial i , x i is the D-dimensional feature vector for trial i , λ is a positive regularization parameter, and β and β 0 are regression parameters (β is a D-dimensional vector; β 0 is a scalar). The LASSO method does not depend on any classifier. The features are automatically discarded corresponding to coefficients that are exactly 0. Thus, the most significant features are selected from multiple frequency bands.

E. SVM Classification
The support vector machine (SVM) has broad applications in BCI systems. It finds an optimal hyperplane with the largest possible margin to separate the samples from two classes [36], [37].
One variant of the algorithm is to solve the following optimization problem min ω,b,ξ = (1, . . . , n) (6) where φ(x i ) maps x i into a higher-dimensional space, C is the penalty parameter of the error term, and ξ is the slack variable.

G. Decision Output
Three SVM classifiers are formed because three sub time windows are used. The decision output is based on the "Max Wins" voting strategy [38], in which three binary SVM classifiers will vote for each class, and the winner class will be the class receiving the maximum votes. The decision function F j of each SVM classifier is The final decision output F is

A. Whole Framework
We first give an overview of the architecture of our method. Our goal is to learn a robust model that can achieve high classification accuracy. To achieve this goal, we propose a novel method, CTFSP, to learn the sparse features from multi-band filtered EEG data across multiple time windows. To be specific, CTFSP is formulated as a combination of multi-band filtering, feature extraction, sparse feature selection, and classifier fusion, resulting in the overall architecture illustrated in Fig. 2. First, the whole MI period is segmented into multiple time windows. In each time window, multi-band filtering is used to increase the signal-to-noise ratio (SNR) of the EEG signals by extracting MI-related features from the mu band, beta band, and their sub-bands. Feature extraction and selection may learn sparse spatial features by using the CSP algorithm and the LASSO method. The final decision output is determined by the voting result of three SVM classifiers.

B. Experimental Results
An extensive experimental comparison is implemented to compare the performance of our proposed method to other competing algorithms.
(2) FBCSP [39]: CSP features are extracted from EEG data within the whole time window at multiple frequency bands (4-8, 8-12, …, 36-40 Hz). Discriminative pairs of frequency bands and corresponding CSP features are automatically selected based on the Mutual Information Best Individual Feature (MIBIF) selection algorithm.
(3) SCSP [40]: Features are extracted by CSP from EEG data within the whole time window in the frequency band  and then optimized to obtain sparse CSP features by introducing an L 1 -norm regularization term in the optimization problem of CSP.
(4) SMFCSP: CSP features are extracted from EEG data within the whole time window at multiple frequency bands (8-13, 8-10, …, 23-30 Hz). The LASSO method is then used to obtain sparse features from multi-frequency filtered CSP features.
(5) CTFSP: The whole MI period is segmented into multiple sub time windows. In each sub time window, the SMFCSP algorithm is implemented.
We then used the LIBSVM classifier to classify the filtered EEG data after applying each of the above comparative methods [41]. The Radial Basis Function (RBF) is chosen as the kernel function, and the default classifier parameters are used. 1 Note that, in our proposed CTFSP algorithm, three SVM classifiers are formed and the final decision is determined by their voting results.
For each of the three datasets, a 10-fold cross-validation approach is used to evaluate the classification performance. The number of CSP filters used is set to 4 (i.e., m = 2). Table II summarizes classification accuracies for all participants. For three datasets, the proposed CTFSP algorithm achieved the highest classification accuracy among the five algorithms. The classification performance of the CTFSP algorithm was significantly better than the performance achieved with CSP ( p < 0.005), SCSP ( p < 0.005), and SMFCSP ( p < 0.005), as assessed with a Wilcoxon signed-rank test.

C. Comparison of Feature Distributions
To facilitate the comparison of feature distributions, Fig. 3 shows the distributions of the two features obtained by CSP,  FBCSP, SCSP, and SMFCSP for exemplar participant aw. The SMFCSP algorithm provided more easily separated feature distributions in comparison with the other three algorithms. This is consistent with the classification performance of the four algorithms. Fig. 4 shows the distributions of the two features with different sub time windows obtained by CTFSP, for exemplar participant aw. The two features with sub time window 1 are the least separable.

D. Classification Performance With Different Time Windows
Fig . 5 presents the classification accuracies obtained by applying the CSP algorithm to EEG data with different time windows for all the participants. The results show that the whole time window is not the best choice for all participants except participant aa. The classification accuracies with sub time window 2 are higher for participants aa, ay, k3, l1, a, and g. The classification accuracies with sub time window 3 are higher for participants al, aw, b, and f. The classification accuracy with sub time window 1 is higher for participant k6. Finally, the classification accuracies with sub time window 1 and with sub time window 3 are the same for participant av.
This shows that the response times to the cues for MI tasks are not the same for all participants. This, in turn, suggests that our multiple time window approach may reduce the risk of misclassification caused by selecting the wrong time window. Fig. 6 describes the process of the frameworks with two different fusion strategies. We carried out the proposed algorithms with classifier fusion (Fig. 6(a)) and feature fusion ( Fig. 6(b)), respectively. Table III lists the individual classification accuracies for all participants with different fusion strategies. We found that the feature fusion approach obtained a slightly higher mean accuracy of 84.57%. However, there was no significant difference between the two fusion strategies in terms of accuracy ( p > 0.05), as measured with a Wilcoxon signed-rank test.

F. Computational Efficiency
We also investigated the computational efficiency for each of the compared algorithms on DS1. Fig. 7 shows the computational time evaluated with 10-fold cross validation using MATLAB R2016b on a PC with a i5-9600K 3.7GHz CPU, and 16GB of RAM. The results indicate that all of these algorithms can be implemented efficiently. Beside this, Table IV lists the testing time needed for each of these algorithms for one trial for participant aa. Although our algorithm took a longer time to compute than other methods, it could meet the requirement of real-time processing (it took less time to compute than the length of the trial). Most of time cost of our method is spent on training multiple models. In other words, our proposed CTFSP algorithm achieved classification accuracy improvements without sacrificing computational efficiency for BCI applications.

IV. DISCUSSION
CSP is a spatial feature extraction method that has become the most commonly used algorithm in MI-BCIs [31]. However, it is also sensitive to noise and has low generalization capacity. Thus, many modifications to the CSP algorithm have been proposed to extend the generalizability of the CSP algorithm across the frequency domain to compensate for the   [42] and SBLFB [43]. In the current study, we used multiple frequency bands to extract CSP features, including the mu band, beta band, and their subbands. Then, our proposed method selected the most useful features from the set of extracted features by means of sparse regression.
Previous studies only considered the influence of the frequency domain and ignored the differences in response times to the cues during MI tasks. As shown in Fig. 5, the response latency of the MI tasks varied for each participant. Therefore, this study used a sliding time window approach and segmented the entire MI period into three sub time windows. We also compared the performance of the frameworks with two fusion strategies. The results show that the classification accuracy with the classifier fusion was not significantly different from that achieved with the feature fusion approach.
Despite the proposed CTFSP algorithm obtaining highly encouraging performance in comparison with other algorithms, this work can further be improved along the following lines. First, the feature selection method based on LASSO does not depend on any classifier, so it may also omit some useful features for classification. Similarly, a previous study [44] showed that the sparse CSP algorithm generally gave lower performance than CSP. Therefore, there may be more effective feature-selection methods that can further improve classification performance. Second, in this study we only used the "Max Wins" voting strategy in the classifier fusion step. This might be the easiest classifier fusion strategy, but is not necessarily the best. Other approaches such as minimum / maximum fusion or Dempster-Shafer fusion could be considered in future work [45], [46]. Third, in fact motor imagery produces not only sensory motor rhythm (SMR) signals but also movement-related cortical potential (MRCP) signals [47]- [50]. Several signal-processing and classification methods have been used in low-frequency MRCP detection including Locality Preserving Projection (LPP) [51], Discriminative Canonical Pattern Matching (DCPM) [52] and CSP [53]. Therefore in future work, we will try to investigate the feasibility of the proposed CTFSP algorithm for simultaneous decoding of SMR and MRCP to reduce BCI inefficiency. Of course, transfer learning and deep learning have also receive increasing attention at the current BCI research [54], [55]. The combination of our work and these approaches may induce new vitality in the BCI field, and is worthy of further study.

V. CONCLUSION
This study proposed an integrated framework of multi-band filtering, CSP feature extraction, sparse feature selection, and classifier fusion, termed CTFSP. We evaluated our proposed algorithm on three public EEG datasets. Our results demonstrate that the CTFSP algorithm outperformed other competing algorithms, indicating that our proposed method is a promising framework to enhance the decoding of MI patterns, which is significant for the development of high-performance BCIs.