Common Spatial Pattern and Riemannian Manifold-Based Real-Time Multiclass Motor Imagery EEG Classification

Several motor imagery classification methods have been developed and achieve higher accuracy. Machine learning (ML) based algorithms utilizing manually designed features often encounter robustness issues, leading to diminished accuracy. While deep learning (DL) based algorithms exhibit promising accuracy, their extensive computational requirements present challenges in implementing them on portable devices, thereby restricting their practical applications. In this paper, we improve the ML-based algorithm’s feature robustness problems by combining common spatial patterns with Riemannian tangent space mapping, enhancing the algorithm’s feature quality. Furthermore, we introduce a method that utilizes the distance between data points and the SVM hyperplane to compute category scores, thereby enhancing classifier performance. Our experiment uses the BCI Competition IV 2A, BCI Competition III 3A, and a self-recorded dataset for subject-specific experiments to validate the algorithm’s classification performance. Experimental results show that the proposed algorithm achieves the best classification performance, with an accuracy of 78.55%, 83.33%, and 57.44% for BCI Competition IV 2A, BCI Competition III 3A, and the self-recorded dataset. Additionally, to assess the practicality of a real-time portable application, we implemented the proposed algorithm on Raspberry Pi and Jetson Nano, measuring their computation time and peak memory usage. The results demonstrate that our algorithm necessitates only 0.08 to 0.3 seconds of computation time and employs a mere 15MB of memory.


I. INTRODUCTION
A Brain-Computer Interface (BCI) bridges the brain and external devices by translating brain signals into corresponding commands.BCI system can assist the rehabilitation process of paralyzed patients and improve their self-care ability.Taking stroke patients as an example, stroke highly likely results in varying degrees of limb impairment in patients [1].Commonly employed clinical rehabilitation methods include physiotherapy (PT), occupational therapy (OT), robot therapy, electrical stimulation, pharmacological therapy, and The associate editor coordinating the review of this manuscript and approving it for publication was Sangsoon Lim .
virtual reality-assisted motor imagery therapy [2].However, for patients with severe limb paralysis, physical rehabilitation treatments (such as PT and OT) are difficult to perform [3].In contrast, motor imagery does not require physical movement and can facilitate brain neural network reorganization, thereby promoting impaired brain function recovery [2], [4].
In EEG-based BCI systems, EEG motor imagery classification algorithms can link the EEG signal and device command.ML-based motor imagery classification algorithms' architecture can be divided into three parts: preprocessing, feature extraction, and classification.Among these, feature extraction plays a pivotal role in the algorithm.Commonly utilized signal features encompass Fourier Transform, Event-Related Synchronization (ERS), Power Spectral Density (PSD), and Common Spatial Patterns (CSP).Nevertheless, features based on the Fourier Transform tend to ignore EEG's temporal pattern [7].Features based on ERS and PSD are susceptible to electrode channels [6], [7].CSP-based features are highly dependent on operation frequency bands [8].Consequently, manually engineered features also suffer from robustness issues [6], [9].
In the pursuit of achieving more accurate classification performance, DL-based algorithms have been explored.For instance, Zhang et al. [6] employed graph embedding to represent spatial nodes of EEG signals, followed by CNN for extracting spatio-temporal features.Subsequently, classification was conducted through attention layers and dense layers.Yang et al. [10] utilized Riemannian covariance as a feature and multi-layer perceptron for classification.However, algorithms based on deep learning often demand higher computational resources, rendering them challenging to implement on portable devices [2], [11], thus imposing limitations on practical applications.
This paper proposed the Discriminative Filter Bank Tangent Space Mapping and Common Spatial Pattern (DFBTSM-CSP) algorithm, which integrates Riemannian tangent space and CSP for EEG feature extraction, employing SVM as the classifier.During classification, we calculate category scores based on the distance between data points and the SVM hyperplane, followed by classification using these category scores.The main contributions of this paper and key advantages of the DFBTSM-CSP algorithm are summarized as follows: • We integrate CSP and Riemannian tangent space for feature extraction.CSP algorithm using optimal spatial filters maximizes inter-class variance, while Riemannian-based feature projection of EEG spatial covariance matrices onto the Riemannian tangent space enhances discriminability and robustness.Experimental results show that the proposed method achieves more accuracy due to the feature providing a more comprehensive analysis of EEG spatial information.
• The proposed methods' final classification decisions are based on category scores computed using the distance between data points and the SVM hyperplane.
Experimental results demonstrate that utilizing cate-gory scores for classification decisions outperforms traditional SVM, yielding superior classification performance.
• We implement frequency band selection using the Fisher ratio, reducing the number of frequency bands by approximately 20-40% and concurrently improving algorithm accuracy by around 2%.
• We validate the feasibility of real-time wearable applications by implementing the algorithm on an embedded system.The results reveal that the proposed algorithm requires a peak memory usage of only 14.85MB and a computation time of merely 0.2 seconds.
The remaining sections of this paper are as follows.Section II provides details of the proposed method.Section III introduces the dataset and the recording method.We present the experimental results and discussions in Section IV.
In Section V, we summarize the contributions of this paper and outline algorithm limitations.

II. METHODOLOGY
The proposed DFBTSM-CSP classification algorithm is shown in Fig. 1.The algorithm architecture can be divided into four parts, including preprocessing, filter bank and band selection, feature extraction, and classification.In the preprocessing phase, noise in the EEG signal is eliminated while retaining frequency bands relevant to motor imagery through the use of low-pass and band-pass filters.During the frequency band selection phase, the EEG signal is decomposed into multiple sub-bands, and the most discriminative frequency bands are chosen based on the training data for feature extraction.This paper employs CSP and Riemannian tangent space as signal features, followed by the computation of category scores based on the distance between the features and the SVM hyperplane to determine the classification outcomes.

A. DATA PREPROCESSING
The frequency range of motor imagery EEG signals is between 1 and 50Hz.Since the 60Hz power-line noise amplitude is much larger than the EEG signal, the power-line noise cannot be effectively removed with a band-pass filter.Therefore, a band-stop filter of 55 to 65 Hz is used to remove the 60Hz noise, and a band-pass filter of 1 to 50 Hz is used to filter out the unrelative signal.

B. FILTER BANK AND BAND SELECTION
The most discriminative frequency band of each subject is not the same [12], [13], so it is crucial to select the suitable frequency band.In this part, the EEG signal is split into different frequency bands through multiple band-pass filters, and the Fisher ratio band selection algorithm is used to select the most discriminative band.In this paper, frequency band selection was conducted using multiple narrowband and broadband sub-bands within the 4 to 35 Hz frequency range.Narrowband signals encompass more prominent motor imagery 139458 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.information and are less susceptible to noise interference, while broadband signals include some apparent features that can enhance motor imagery performance [14].Employing frequency band selection also facilitates the removal of less discriminative frequency bands.The band-pass filter bands of the filter banks used in this paper are respectively 4-10Hz, 7-13Hz, 11-17Hz, 15-21Hz, 18-24Hz, 21-27Hz, 24-31Hz, 27-34Hz, 31-37Hz, 4-14Hz, 13-23Hz, 22-32Hz, and 8-35Hz.
The Fisher ratio band selection algorithm [12], [15] ranks the bands according to inter-class variance and intra-class variance of the band power, and selects the most discriminative band.Firstly, the power of specific frequency band in each trial is calculated, which can be expressed as (1).
where P t,f is the power of the f-th frequency band of the t-th test, N is the total number of sampling points, x t,f (n) is the signal of the t-th sampling point of the f-th frequency band.
Then one calculates the average power of each frequency band and the average power of the specific frequency band of each category, as shown in ( 2) and ( 3) respectively.
It is noticed that m f in (2) is the average power of the fth frequency band, t is the number of trials, M is the total number of trials.In (3), m f ,c is the average power of the f-th frequency band of class C, t c is the trial of class C, M c is the total number of trials of class C, P t c ,f is the power of the f-th frequency band of the trials of class C. Accordingly, the intra-class variation and inter-class variation values under specific frequency band are calculated, as shown in ( 4) and ( 5) respectively.
Note that S W ,f in ( 4) is the intra-class variation value of the f-th frequency band, D is the total number of classes, C is the class.And S B,f is the inter-class variation value of the f-th frequency band.Therefore, the Fisher ratio F R,f can be obtained to evaluate the importance of the frequency band.The calculation of F R,f is given as.
where F R,f is the FR value of the f-th frequency band.The higher the FR value, the more discriminative the frequency band is.Therefore, the importance of each frequency band can be ranked according to the FR value.

C. FEATURE EXTRACTION 1) TANGENT SPACE MAPPING OF RIEMANNIAN MANIFOLDS
Compared with Euclidean space, Riemannian space can more accurately describe the correlation between high-dimensional EEG signals.This property may achieve good results in the classification of motor imagery signals [16], [17].Firstly, the sample covariance matrix is calculated as where P t ∈ R N ×N is the sample covariance matrix of the t-th test, N is the total number of sampling points, superscript T is the sign of transpose matrix, X t is the EEG signal of the t th test.
Then calculate the Euclidean mean by Eq. ( 8) as the initial value of the Riemann mean.
where P E is the Euclidean mean value and M is the total number of tests.Note that because the calculation of the Riemannian mean P R is not an analytical solution, it needs to be approximated by iterative method.We can project the point P i from Reimann to tangent space by performing the Riemann Log map operation through Eq. ( 9).And perform the Riemann Exp map operation through Eq. ( 10) to project the point S i from the tangent space to the Riemann space.
The Riemann mean algorithm [16] performs the Riemann Log map operation on all training data and calculates the average value of the tangent space S, and then performs the Riemann Exp map operation on S to obtain the Riemann mean value P R , and iterates until the change in P R is less than the tolerance ε.The Riemann mean algorithm is summarized in Fig. 2.After obtaining the Riemannian mean, the Riemannian manifold can map to the tangent space by Eq. ( 9).Thus, the Euclidean space vector [18] is extracted from the tangent space T s,t as a feature, and the extraction method can be expressed as where v t is the Euclidean space vector extracted from tangent space T s,t , upper (•) means the upper triangular part of the matrix, and the off-diagonal elements are multiplied by the weight √ 2. Note that the upper triangular elements are converted into a one-dimensional vector.
The extracted v t vector is used to select the better features using Mutual Information based Best Individual Feature (MIBIF) [19], [20].
Suppose there is a set of labels Y , and a set of feature vectors F, each of which contains d features, where d = , F = {f 1 , f 2 , . . .f d }, we need to find out the better k features from the d features by the score of mutual information.The mutual information formula is where I (f i ; Y ) is the mutual information between f i and Y , i = 1, 2, . . .d, H (f i ) is the entropy of f i , and are respectively given as ( 13) and (14).
where p (•) is the probability function.Through ( 12) to ( 14), one gets d number of the mutual message scores.Then k features with higher scores are chosen as the selected features 2) COMMON SPATIAL PATTERNS Common Spatial Patterns [21] is a way to maximize the difference in the variation of two categories time-domain signals.From multiple electrode channels, it can find a set of optimal spatial filter.The inner product of the spatial filter and time domain signals can make the variation of numerical difference between the different categories to maximize, and the variation of numerical differences between the same category to minimize.Firstly, the normalized covariance matrices of time domain signals of the two categories are calculated as.
where E 1 and E 2 are respectively the normalized covariance matrices of category 1 and category 2; X 1 and X 2 are respectively the time domain signals of category 1 and category 2; and trace is the trace of the calculation matrix.Then, the covariance matrix of the mixed space is given as where E is the covariance matrix of the mixed space.E 1 and E 2 are the average normalized covariance matrices of category 1 and category 2, respectively.The eigenvalue decomposition of the covariance matrix in the mixed space is done with the eigen-diagonal matrix arranged in descending order, as shown in (17).
where U is the eigenvector matrix of matrix E and λ is the eigen diagonal matrix.Then, the whitening matrix can be obtained as.
where P is the whitening matrix.Then perform a whitening transformation between E 1 and E 2 as (19).
139460 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
where S 1 and S 2 are respectively the average normalized covariance matrix of category 1 and category 2. The eigenvalue decompositions of S 1 and S 2 respectively have the eigen diagonal matrix of S 1 arranged in descending order, and that matrix of S 2 arranged in ascending order, as shown in (20).
It is noticed that B 1 = B 2 , so the eigenvector matrix of S 1 is the same as S 2 , and the sum of eigen diagonal matrices λ 1 and λ 2 is the identity matrix as Since the sum of the eigenvalues of the two categories is always one, the eigenvector corresponding to the maximum eigenvalue of S 1 will also correspond to the minimum eigenvalue of S 2 , and vice versa.Therefore, a spatial filter can be obtained to maximize the difference of variation values of different categories and minimize the difference of variation values of the same category.
The spatial filter is given as The spatial projection matrix W is the desired spatial filter.
Then one has the inner product of the spatial filter W with the original brain signal X as where Z is the spatially filtered signal.Thus, the first and last m column vectors of Z are used to calculate the variation values of the column vectors.Accordingly, 2m features can be obtained as where f p is the feature obtained by taking out the p −th column vector, and var(•) is the variation value.

D. CLASSIFICATION
In the feature extraction, we use a one-versus-one support vector machine [22] to train multiple binary classifiers to vote for the multi-classification task.We respectively perform SVM on the features from Reimann TSM and CSP.Then calculate the category scores through Eq. ( 25) and (26).Finally, the result with a higher category score is used as the classification result.The classification process of SVM distance analysis is shown in Fig. 3.
where w is the normal vector of the hyperplane, f is the input feature vector, b is the offset of the hyperplane relative to the origin, L x is the distance between the test point and the hyperplane, M x is the average distance between the training points and the hyperplane, and C is the predict category.

III. DATASET
This paper conducts experiments using two publicly available datasets (BCI Competition IV 2A and III 3A) along with a self-recorded dataset to validate the algorithm's performance through subject-specific experiments.The training and testing data of the datasets employed in this study were recorded from subjects on different days.In BCI Competition IV 2A dataset [23], there are 9 subjects containing four categories of motor imagery (left hand, right hand, feet and tongue).Each category contains 72 training sets and 72 testing sets, so there are 288 trials in the training set and the testing set The EEG signal was sampled at 250Hz and had 22 electrode channels.BCI Competition III 3A dataset [24] has 3 subjects and contains four categories of motor imaginary, (left hand, right hand, feet and tongue).The first subject (K3B) contains 90 trials in each category and a total of 360 trials, among which 180 trials are used as training set and the remaining 180 trials as testing set.As for the second subject (K6B) and third subject (L1B), each category contained 60 trials, a total of 240 trials, of which 120 trials were used as the training set and the remaining 120 trials as the test set.The brain signal was sampled at 250Hz with 60 electrode channels.Due to the excessive number of electrode channels, which greatly increases the computational burden, this study only uses electrode channels in the motor and somatosensory areas of the cerebral cortex, including electrode channels from 17 to 45, as shown in Fig. 4.
The self-recorded motor imaginary brain signal dataset includes five right-handed men, aged 23 or 24, who sat in a moderately high armchair, rested their hands on their legs when not moving, and did not blink during imaginary exercise.The data set consists of four categories, namely left hand, right hand, both hands or right foot and no action, which  The trial recording process is shown in Fig. 6.At the beginning of the test (t=0s), the screen goes blank; at 3 seconds, there are text and picture occur (t=3s); at 5 seconds, there are hints for motor imagery (t=5∼9s); at 9 seconds, the screen goes blank (t=9s).Then rest and wait for the next trial to begin.

IV. RESULT
This paper implements the algorithm using Python, and the experimental workflow is illustrated in Fig. 7.In assessing the algorithm's resource usage, we train the proposed model on a personal computer (PC) and subsequently deploy the model on embedded systems (Jetson Nano or Raspberry Pi) to evaluate computational time and memory usage.A subject-specific approach is employed to evaluate the algorithm's classification performance.In order to prevent the data leakage problem, training and testing data are recorded on different days, which is similar to the experimental procedures of prior studies [5], [25], [26], [27].
As shown in Table 1, the results indicate that the method proposed exhibits superior performance, with accuracies of 78.55%, 83.33%, and 57.44% on the BCI Competition IV 2A dataset, BCI Competition III 3A dataset, and the self-recorded dataset, respectively.Proposed Method 1 outperforms other methods, indicating that the proposed feature is more discriminative.Comparison between Proposed Method 1 and Proposed Method 2 shows that employing SVM distance-based category score computation yields better classification performance.Furthermore, contrasting Proposed Method 2 with DFBTSM-CSP reveals that removing non-discriminative data through frequency band selection reduces feature dimension and training noise, resulting in improved classification performance.the self-recorded dataset.Table 3 shows the classification accuracy increase by 2.25% on the BCI competition IV 2a dataset, 1.92% on the BCI competition IV 3a dataset, and 1.93% on the self-recorded dataset.

B. EFFECT ANALYSIS OF FREQUENCY BAND SELECTION
Summarizing the results of Table 2 and Table 3, the proposed DFBTSM-CSP architecture uses fewer computing resources and achieves better classification performance.

C. EFFECT OF SELECTING BAND IN EMBEDDED SYSTEM
The proposed DFBTSM-CSP architecture also has excellent performance in real-time classification.For brain signal measurement with 8 electrode channels, 1000Hz sampling frequency, sampling point of 3 bytes and the time duration of 4 seconds, the classification time and memory peak are 0.016∼0.026seconds and 13.83MB respectively on a PC with i5-9500 CPU and DDR4 2666MHz 16GB RAM.
The same experiment on Jetson Nano 2GB has classification time and memory peak as 0.077∼0.122seconds and 14.854MB respectively.Moreover, the classification time and memory peak respectively are 0.2∼0.307seconds and 14.851MB peak memory on Raspberry Pi 3B+.These experimental results show that the proposed classification algorithm implemented in real-time applications is feasible.

V. CONCLUSION
This paper integrates CSP and Riemannian tangent space as features and proposes the use of SVM distance analysis for classification.Through subject-specific experiments, this study validates the proposed DFBTSM-CSP algorithm on the BCI Competition IV 2A dataset, BCI Competition III 3A dataset, and a self-recorded dataset, achieving the highest accuracies of 78.55%, 83.33%, and 57.44%, respectively.To further investigate the impact of different components on algorithm performance, we compare Proposed Method 1 (without SVM distance analysis and band selection) and Proposed Method 2 (without band selection).The experimental results of Proposed Method 1 demonstrate that the features introduced in this paper are more discriminative compared to other methods.The results of Proposed Method 2 show that both SVM distance analysis and frequency band selection enhance the classifier's performance.Further comparison between Proposed Method 2 and DFBTSM-CSP reveals that frequency band selection leads to a 1-2% improvement in accuracy and reduces the use of frequency bands by 20-40%, making the algorithm more feasible for portable devices.This paper also implements the algorithm on embedded systems (Raspberry Pi and Jetson Nano) to verify practicality for portable applications.The results indicate that the computational time of the algorithm is only 0.08 to 0.3 seconds, and it utilizes approximately 15MB of memory.
While this paper focuses on implementing a multi-class motor imagery classification algorithm on portable devices, it does not consider cross-subject application scenarios.In the future, we hope to address the issue of reduced algorithm classification performance due to variations in signal characteristics among different subjects by integrating domain-adaptive algorithms [31].
Furthermore, this paper does not explicitly address crucial artifacts like eye-blinking noise.Currently, the algorithm relies on frequency band selection and uses multiple features to enhance robustness [12], [14], [15], [28], [32].In future works, there is an aspiration to integrate the algorithm with pertinent techniques [33], [34], [35] This collaborative approach is anticipated to provide a more comprehensive solution to mitigate the effects of intentional disturbances.

FIGURE 3 .
FIGURE 3. The process flow of SVM distance analysis classification.

Table 2 and
Table 3 make a comparison of proposed method 2 (without band selection) and DFBTSM-CSP (with band selection), and show the effectiveness of the band selection.Table 2 shows the number of frequency bands used is reduced by 19.16% on the BCI competition IV 2a dataset, 42.61% on the BCI competition IV 3a dataset, and 20% on Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE 1 .
Comparison of motor imagery classification algorithm performance.

TABLE 2 .
Number of bands comparison of the proposed method 2 and the DFBTSM-CSP.

TABLE 3 .
Classification accuracy comparison of the proposed method 2 and the DFBTSM-CSP.