Motor Imagery Classification for Asynchronous EEG-Based Brain–Computer Interfaces

Motor imagery (MI) based brain-computer interfaces (BCIs) enable the direct control of external devices through the imagined movements of various body parts. Unlike previous systems that used fixed-length EEG trials for MI decoding, asynchronous BCIs aim to detect the user’s MI without explicit triggers. They are challenging to implement, because the algorithm needs to first distinguish between resting-states and MI trials, and then classify the MI trials into the correct task, all without any triggers. This paper proposes a sliding window prescreening and classification (SWPC) approach for MI-based asynchronous BCIs, which consists of two modules: a prescreening module to screen MI trials out of the resting-state, and a classification module for MI classification. Both modules are trained with supervised learning followed by self-supervised learning, which refines the feature extractors. Within-subject and cross-subject asynchronous MI classifications on four different EEG datasets validated the effectiveness of SWPC, i.e., it always achieved the highest average classification accuracy, and outperformed the best state-of-the-art baseline on each dataset by about 2%.

potential.The paper focuses on MI, where the user imagines the movement of various body parts, e.g., left/right hand, both feet or tongue, to elicit different EEG patterns and hence to control external devices.It has been used in upper limb robotic rehabilitation [5], text input [6], wheelchair control [7], etc.
Most existing MI-based BCIs use specific triggers to indicate the start and end of each MI trial, which may be inconvenient in practice.For example, when an MI-based BCI is used to navigate a wheelchair, the control commands should be sent out whenever the user wants, instead of only after some triggers.
Mason and Birch [8] first proposed the concept of asynchronous BCIs.As shown in Fig. 2, a subject generates control signals by consciously changing his/her mental state: when the subject starts MI, the BCI system detects it and completes the corresponding instruction; otherwise, it keeps still.
Asynchronous MI-based BCIs are challenging to implement, because the algorithm needs to first distinguish between resting-states (no MI) and MI trials, and then classify the MI trials into the correct task, all without any triggers.Very few studies have appeared in the literature.For example, Sugiura et al. [9] adopted a hierarchical hidden Markov model, and Saa and Cetin [10] proposed conditional random fields and latent dynamic conditional random fields for EEG classification in asynchronous BCIs.
This paper proposes a sliding window prescreening and classification (SWPC) approach for asynchronous MI-based BCIs, which consists of two modules: 1) Prescreening module, where a classifier with a fixed window length, trained with both supervised learning and self-supervised learning (SSL), is used to prescreen MIs from the resting-state.If the output probability exceeds a threshold, then the EEG trial is sent to the next module for classification.2) Classification module, where a classifier, also trained with supervised learning and SSL, is used for MI classification.Within-subject and cross-subject experiments on four MI datasets demonstrated the effectiveness of SWPC, particularly SSL, to refine the feature extractors.
The remainder of this paper is organized as follows.Section II introduces our proposed SWPC.Section III validates the performance of SWPC on four MI datasets.Finally, Section IV draws conclusions and points out future research directions.

II. METHODOLOGY
This section introduces our proposed SWPC for asynchronous MI-based BCIs.The code is publicly available at https://github.com/why135724/SWPC.The second training set, Ds = {( X s i , ȳs i )} 2n s i=1 , is used in the prescreening module to distinguish MI trials from the restingstate.It consists of n s labeled MI trials from D s , and another n s resting-state trials X s i ∈ R ch×ts adjacent to the MI trials.ȳs i ∈ {0, 1} (0 denotes resting-state, and 1 denotes MI) is the label of X s i .The test data X t ∈ R ch× f l is a long EEG data stream with f l time domain samples, where usually f l ≫ ts.It does not include any triggers, so we do not know when an MI trial starts.The goal is to correctly identify the MI periods and further classify them into specific MI tasks.

A. Flowchart of SWPC
To simplify the problem, we split X t with sliding window length L w and step 10 to get n t test trials D t = {X t i } n t i=1 .Each trial X t i is then passed to the prescreening module, which outputs the probability pi of X t i being MI.If pi exceeds a threshold τ , as shown in Fig. 4, then X t i is further passed to the classification module.If multiple successive trials have pi ≥ τ , then their corresponding classification probabilities are averaged as the final output.The predicted label for X i is denoted as ŷt i .

C. The Prescreening Module
As shown in Fig. 5, the prescreening module includes first supervised training and then SSL.Supervised training performs binary classification between MI and resting-states.It trains a feature extractor fθ and a classifier hψ on Ds , using the cross-entropy loss: Both θ and ψ are updated with gradient descent.SSL is used to fine-tune the feature extractor fθ .It first constructs 2n s transition trials (negative samples): where X s r and X s m are randomly selected resting-state trial and MI trial from Ds , respectively.Next, it updates the feature extractor fθ and simultaneously another auxiliary feature extractor fφ , on positive samples Ds and negative samples Ds = { Xi } n s i=1 using the contrastive loss: where δ = 0.3 is a hyperparameter controlling the contribution of the negative samples, and σ = 2.0 determines the Gaussian kernel width.Note that fθ and fφ are L2-normalized before entering (3), i.e., fθ in ( 3) is optimized by gradient descent, whereas fφ is optimized through exponential moving average (EMA): where λ = 0.9995.For a test EEG trial X t i , only fθ and hψ are used to compute the prescreening probability pi , i.e., If pi exceeds a threshold τ , then the EEG trial is further passed to the classification module.

D. The Classification Module
As shown in Fig. 6, the training process of the classification module is similar to that of the prescreening module.It also consists of two steps: supervised training and SSL.
Supervised training of the classification module remains the same as supervised training of the prescreening module, except that the EEG trials are classified into different MI tasks, instead of MI and resting-state.
SSL is again used to refine the feature extractor f θ .We replace the construction of negative samples in the prescreening module with data augmentation in the classification module.The following data augmentations are used in this paper: 1) Adding noise: Uniform noise 0.5U (−δ, δ) is added to each element of the feature vector, where δ is the standard deviation of the original feature.2) Scaling amplitude: Each feature is scaled by 0.75 or 1.25.3) Masking channels: Randomly set all signals in some EEG channels to 0. 4) Masking segments: Randomly set some segments of the EEG signal to 0. For each trial X s i , we randomly select two different data augmentations to get X s i,1 and X s i,2 .{X s i,1 } n s i=1 and {X s i,2 } n s i=1 are then L2-normalized using (4) before computing the following contrastive loss: For an input test trial X t i , the instantaneous classification probability is

TABLE I SUMMARY OF THE FOUR MI DATASETS
To stabilize the output, we average all successive p i whose corresponding pi exceed τ , i.e., pi = where i 0 is the smallest index that ensures all { p j } i j=i 0 exceed the threshold τ .pi is the final prediction probability for X t i .More specifically, as shown in Fig. 7, the process of computing pi is: We pass X t i ∈ D t to fθ and hψ to get pi .If pi < τ , then we classify the corresponding X t i as restingstate; otherwise, we further pass X t i to f θ and h ψ to get p i .pi is then averaged by ( 9) and used to derive ŷt i .The pseudo-code of SWPC is given in Algorithm 1.

III. EXPERIMENTS
This section evaluates the performance of our proposed SWPC on four public MI datasets in both within-subject and cross-subject classification.

A. Datasets
Four public datasets from BNCI-Horizon, 1 summarized in Table I, were used in our experiments: 1) MI1 was the 001-2014 dataset [11] recorded from 9 subjects.Each session included 6 runs separated by short breaks.EEG signals were sampled at 250Hz.Only two classes (left-hand and right-hand) were used.2) MI2 was also the 001-2014 dataset, but with all four classes, i.e., left-hand, right-hand, feet, and tongue.3) MI3 was the 002-2014 dataset [12]   Pass D s to f θ and h ψ to compute L sl (h ψ , f θ ) in (1);   the results of the 2nd and 3rd subjects for the same reason.Note that Subjects 5 and 6 in MI1, Subject 10 in MI3, and Subjects 2 and 3 in MI4, were removed because their results were close to random.
All EEG signals were bandpass-filtered between 8Hz and 30Hz, and then notch-filtered at 50Hz.

B. Performance Evaluation
The classification accuracy (ACC) was used as the performance metric.The specific computation details are illustrated in Fig. 8, where the yellow bar indicates the true MI period: 1) When all sliding windows during the true MI period have pi ≥ τ , pi corresponding to the last sliding window with pi ≥ τ is used to evaluate the classification accuracy.
2) When the true MI period is broken into two or more intervals, during each of which all pi ≥ τ , pi corresponding to the last sliding window with pi ≥ τ in the last interval is used to evaluate the classification accuracy.3) When no sliding window in the MI period has pi ≥ τ , the classification is 'wrong'.

C. Algorithms
SWPC used the EEGNet backbone.Supervised learning used learning rate 0.0005 and early stopping with patience 30.SSL used learning rate 0.00005 and 40 training epochs.
SWPC was compared with the following 11 approaches: 1) Continuous EEG classification (CEC) [14], which uses CSP and thresholding to identify MIs in EEGs.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.2) Joint training scheme (JTS) [15], which combines the transitional imagery data (between the resting state and MI) with the resting state to train a binary classifier.3) Bootstrap Your Own Latent (BYOL)2 [16].4) Simple framework for Contrastive Learning of Representations (SimCLR)3 [17].5) Momentum Contrast (MoCo)4 [18].6) ContraWR5 [19].7) Self-supervised contrastive learning (SSCL) [20], which was proposed for cross-session MI classification.8) Ou2022 [21], an SSL approach for MI classification.9) Model-agnostic meta-learning (MAML) [22], which learns a good initialization for fast adaptation.10) Ensemble of averages (EOA) [23], which trains an ensemble of independent moving average models.11) Song2022 [24], an event-related desynchronization detection and false positive rejection algorithm based on the time-frequency characteristics of MI.Note that CEC and Song2022 were proposed for asynchronous MI classification, so their original algorithms were  implemented.The other 9 algorithms cannot be directly used for asynchronous BCIs, so they were embedded into SWPC.More specifically, BYOL, SimCLR, MoCo, ContraWR, SSCL and Ou2022 were used to replace the SSL part of SWPC (the supervised learning part was kept), MAML and EOA were used to replace the supervised training part of SWPC (the SSL part was removed), and JTS was only used in supervised training of the prescreening module.
Both within-subject and cross-subject classifications were performed.For within-subject experiments, Session 1 was used for training, and Session 2 (with all triggers removed) of the same subject for testing.For cross-subject experiments, Session 2 of a subject was used for testing, and Session 1 from all other subjects were combined for training.40% of the training data were reserved as the validation set for determining the optimal hyperparameters, which were then applied to all training data to re-train the model.Except for CEC, which has no randomness, all other algorithms were repeated five times, and the average is reported.

D. Results
The ACCs and standard deviations (across different subjects) of different approaches on the four MI datasets in within-subject classification are shown in Tables II-V, respectively.The ACCs and standard deviations in cross-subject classification are shown in Tables VI-IX, respectively.The best results are marked in bold.Clearly, our proposed SWPC achieved the best average results on all four datasets in both within-subject and cross-subject classifications.
We also studied the effectiveness of SSL to the prescreening module and the classification module.Table X shows the MI identification accuracies of the prescreening module, with and without SSL.Table XI shows the ACCs of the classification module, assuming there are triggers.Clearly, SSL was always beneficial to both the prescreening module and the classification module.
Paired t-tests were performed to evaluate whether the performance improvements of SWPC over others were statistically significant in within-subject and cross-subject classifications.The results are shown in Tables XII and XIII, respectively, where p-values smaller than 0.05 are marked with asterisks.Most of the performance improvements were statistically significant; particularly, in cross-subject classification, SWPC statistically significantly outperformed each algorithm on at least three datasets.

E. Ablation Study
Ablation studies were performed to evaluate if SSL in the prescreening module and the classification module, and the final averaging, are truly necessary and beneficial.Withinsubject and cross-subject classification results are shown in Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE XVI ONLINE AND OFFLINE WITHIN-SUBJECT CLASSIFICATION ACCURACIES
Tables XIV and XV, respectively.Clearly, all three components were essential to the superior performance of SWPC.

F. Parameter Sensitivity Analysis
This subsection evaluates the sensitivity of SWPC to the time window length L w and the prescreening threshold τ .The results are shown in Figs. 9 and 10, respectively.L w = 1 and τ = 0.2 seem to achieve the overall best performance on all datasets.

G. Offline Classification
All previous subsections considered online classification, i.e., the test data are available on-the-fly.This subsection further considers offline classification, where all test EEG data are available.
In offline classification, SSL on the test set D t = {X t i } n t i=1 (instead of on the training set D s in online classification) may be used to improve the performance.Specifically, we first conducted SSL in Section II-C on D t , and then SSL in

TABLE XVII ONLINE AND OFFLINE CROSS-SUBJECT CLASSIFICATION ACCURACIES
Section II-D on Dt , which consisted of EEG trials predicted as MI in D t .Tables XVI and XVII show the average results on the four datasets in within-subject and cross-subject classification, respectively.Offline classification accuracies were higher than their online counterparts for all approaches in both scenarios, because SSL on the test data themselves extracted more tailored features.

IV. CONCLUSION AND FUTURE RESEARCH
Asynchronous MI-based BCIs aim to detect the user's MI without explicit triggers.They are challenging to implement, because the algorithm needs to first distinguish between resting-states and MI trials, and then classify the MI trials into the correct task, all without any triggers.This paper has proposed SWPC for MI-based asynchronous BCIs, which consists of two modules: a prescreening module to screen MI trials from the resting-state, and a classification module for MI classification.Both modules are trained with supervised learning followed by SSL.Within-subject and cross-subject asynchronous MI classification on four different EEG datasets validated the effectiveness of SWPC, particularly, SSL to refine the feature extractors.
Our future research directions include: 1) Transfer learning: Transfer learning can further mitigate cross-subject and cross-session data discrepancies.For asynchronous BCIs, data alignment [25], source-free domain adaptation [26], and domain generalization [27] approaches may be used to further improve performance and protect user privacy.2) Test-time adaptation: Test-time adaptation updates the classifier using online unlabeled data to improve its performance.Our recent work [28]

Fig. 2 .
Fig. 2. Illustration of asynchronous MI classification.The user may switch between resting-state and MI at any unknown time.

Fig. 3 2 )
Fig.3shows the flowchart of our proposed SWPC for asynchronous MI-based BCIs.It includes two modules:1) Prescreening module, where an EEGNet[4] classifier with a fixed window length is trained to screen the MIs out of the resting-state EEG trials.2) Classification module, where another EEGNet classifier is trained to classify the prescreened MI trials.

Fig. 8 .
Fig. 8. Illustration of computing the classification accuracy in testing.

Fig. 9 .
Fig. 9. Change of the classification accuracy w.r.t. the time window length L w in (a) within-subject classification; and, (b) cross-subject classification.

Fig. 10 .
Fig. 10.Change of the classification accuracy w.r.t. the prescreening threshold τ in (a) within-subject classification; and, (b) cross-subject classification.

TABLE XII ADJUSTED
p-VALUES OF PAIRED t -TESTS BETWEEN SWPC AND OTHER APPROACHES IN WITHIN-SUBJECT CLASSIFICATION has demonstrated its promising performance in synchronous MI-based BCIs, but how to apply it to asynchronous BCIs requires further investigation.3) More BCI paradigms: Only MI was considered in this paper.It is interesting to study if SWPC can be extended to other classical BCI paradigms, e.g., event-related potential and steady-state visual evoked potential.