Introduction
Brain-computer interfaces (BCIs) [1], [2], [3] are systems exchanging information between the brain and external devices. A BCI can recognize human intentions and emotional states by decoding brain signals into computer-readable information. In the literature, researchers have extensively studied brain signals of different categories as information sources for decoding valuable information, including electroencephalogram (EEG) [4], magnetoencephalography [5], and electrocorticography [6]. Specifically, EEG has gained considerable attention among these modalities due to its capacity for directly measuring cortical activities. It offers a relatively high temporal resolution and is conducive to noninvasive signal recording. EEG-based BCIs utilize different brain patterns, such as P300 [7], steady-state visual evoked potentials (SSVEPs) [8], and motor imagery (MI) [9]. Based on these paradigms, various BCI applications have been developed, including the rehabilitation of people affected by motor disabilities [10], external device control [11], and concentration level analysis [12].
In recent years, transfer learning [13] has become a promising method for EEG classification. It leverages auxiliary information from previous subjects to improve the learning performance of the new subject. Specifically, transfer learning mitigates the impact of insufficient training EEG data and reduces the individual differences between the source and target subjects. In general, existing transfer learning-based EEG classification methods are multi-source methods [14], [15], [16], i.e., exploiting knowledge from multiple source domains to improve the learning performance of the target domain. In this setting, there are two main learning strategies. The first strategy combines the data from different source subjects as a source domain [17]. For example, by combining the data from existing subjects into one source domain, Liu et al. [17] proposed reducing the distribution differences between the source and target domains by minimizing the maximum mean discrepancy [18]. This strategy can be improved since the EEG signals vary across individuals, and the similarities between each source subject and the target subject may differ. Therefore, the second strategy allows different source subjects to have different influences on the target subject [19], [20]. For instance, Zhang and Wu [19] proposed estimating the domain similarity by utilizing the scatter matrix and Riemannian manifold, thus obtaining different weights for different source subjects. In [20], Liu et al. proposed an adversarial neural network using weighted fusion to represent the different correlations between the source and target subjects.
However, since EEG signals contain sensitive personal mental and health information, privacy concerns arise [21]. To address this issue, Zhang and Wu [22] proposed generating a virtual intermediate domain to reduce the distribution gap between different subjects. In [23], Zhang et al. introduced black-box settings where source models are only available for prediction querying and the model parameters are inaccessible. The above-mentioned methods mainly assume that the EEG data of the target subject are provided in advance. Nevertheless, in some practical cases, the target subject may prefer prompt BCI utilization over the time-consuming process of collecting data for calibration and adaptation, i.e., when the target EEG samples arrive in an online manner, which makes the above assumption inapplicable. Consequently, these models cannot be adopted for online privacy-preserving EEG classification.
Based on the above discussions, in this paper, we propose a novel model named Online Source-Free Transfer Learning (OSFTL), which simultaneously protects the privacy of the source subjects and makes online predictions on the target sequential trials. Specifically, the proposed learning procedure consists of offline and online stages. At the offline stage, we train multiple classifiers on the EEG samples from multiple source subjects to obtain multiple source model parameters. As a result, OSFTL only utilizes the source parameters and does not need to access the raw EEG data, thus preserving the privacy of the source subjects. At the online stage, OSFTL trains a target classifier based on the online sequence of EEG samples. Then, the source classifiers and the target classifier are ensembled by a weighted combination to achieve the final classifier. In addition, based on the similarities between each source classifier and the target classifier, OSFTL dynamically updates the transferred weight of each source domain to guarantee good transferability.
The contributions of this paper are summarized as follows:
We propose a novel model called OSFTL, which leverages online transfer learning for privacy-preserving EEG classification. Online prediction for EEG signals while protecting the privacy of subjects is an under-researched field, which makes this paper a decent contribution to the BCI community.
We develop a novel weighting strategy, which dynamically captures the transferability of different source domains by measuring the classifier discrepancy in an online manner.
Extensive simulated online experiments on two public EEG datasets validate the effectiveness of the proposed method. Moreover, we conduct a real-world online experiment to further demonstrate the usability of the proposed method in practical applications.
The remainder of this paper is organized as follows. In Section II, we introduce the relevant recent studies in brief. We describe our method in Section III. Sections IV and V present the experimental results. In Section VI, we provide relative discussions about the experimental results. In Section VII, we make a conclusion to this paper.
Related Work
A. Offline Transfer Learning
Many existing BCI-related transfer learning studies focus on offline settings. The methods employed in these studies include but are not limited to enhanced classical methods, domain adaptation, and deep network transfer learning. For example, Common Spatial Pattern (CSP) [24] is a well-known method designed to extract spatial features from EEG signals. As transfer learning has become a popular strategy for addressing the problem brought by individual differences, researchers have developed enhanced methods to improve the adaptability of CSP when applied to new subjects with limited data. Regularized CSP methods, such as [25] and [26], introduce regularization penalties in the objective function to minimize the differences between inter-subject features. In [27], Dai et al. proposed a transfer kernel CSP to learn a domain-invariant kernel.
Domain adaptation [15] is a transfer learning technique that aims to adapt the data distribution between the source and target domains. Researchers have extensively applied domain adaptation to cross-subject EEG classification. Li et al. [28] proposed building a multi-task learning model using style transfer mapping for emotion recognition. Lan et al. [29] proposed handling a cross-dataset setting by applying maximum independence domain adaptation. A similarity measure based on the Kullback-Leibler (KL) divergence is proposed in [30] to improve motor imagery BCI systems. In [31], Liang and Ma proposed selecting subjects with similar features to the target subject as transfer sources through Riemannian geometry alignment. Joadder et al. [32] obtained the best feature set across different subjects by applying joint feature selection and feature fusion. Considering the conditional distribution discrepancy, Dagois et al. [33] proposed selecting recorded data from subjects whose class conditional probabilistic distributions are close to the new subject for reducing calibration requirements.
In recent years, researchers have also explored knowledge transfer methods based on deep neural networks, including fine-tuning [34], deep network adaptation [35], [36], and generative model [37]. Wu et al. [34] proposed fine-tuning a model pretrained on source domain data with different amounts of target data. In [35], Li et al. developed a network structure that obtains similar latent representations between the source and target domains for effective BCIs. Chen et al. [36] proposed a multi-subdomain adaptation network to solve the time-related distribution shift problem that occurs in motor imagery. In [37], Lee et al. handled the inter-task transfer learning problem by generating motor imagery data from motor execution data using a variational autoencoder.
Despite the success achieved by the above offline transfer learning models, these models cannot make predictions for data arriving in an online manner. Besides, the above-mentioned approaches do not consider protecting the subject privacy.
B. Online Transfer Learning
Unlike offline transfer learning, online transfer learning (OTL) focuses on sequentially-arrived data. Before proceeding to OTL, we briefly introduce online learning. Online learning handles scenarios where data are not collected in advance as a batch but arrive in a sequence. Perceptron algorithm [38] is a well-known approach that updates classification models when the label of the incoming sample is incorrectly predicted. Passive Aggressive algorithm [39] updates the model based on both the classification confidence and whether the algorithm misclassifies the incoming sample.
By combining the concepts of online learning and transfer learning, researchers proposed a framework called online transfer learning [40]. In addition to learning from the online data sequence, online transfer learning leverages knowledge from offline source data. In [40], Zhao et al. proposed transferring knowledge from a single source domain to improve the performance in the target task. Boosting-based methods such as [41] employ multiple source domains by adjusting the weights among source domains based on their similarities to the target domain. Considering the case of multiple source domains, Wu et al. [42] proposed exploiting knowledge from the source domains using an ensemble strategy on trained source classifiers. In [43], Yan et al. proposed handling the online heterogeneous transfer learning problem by leveraging labeled source data and unlabeled auxiliary co-occurrence data. In [44], multi-source online transfer learning is extended to deal with multi-class classification problems. Kang et al. [45] studied transfer strategy under the partial feedback setting where only the correctness of the predicted label is provided.
OTL has found applications in the field of biomedical signal classification. Ye et al. [46] incorporated hypergraph [47], [48] and OTL scheme to address cross-subject ECG-based emotion recognition. In [49], synthetic fMRI data generated by the generative adversarial network are used to warm up the online transfer learning model, reducing the need for target samples to train an online model with good performance. In [50], Zhang et al. combined offline discriminative feature extraction and OTL to address the time-varying problem of EEG signals. However, most existing OTL methods for EEG classification cannot preserve the privacy of subjects.
C. Source-Free Transfer Learning
Source-free transfer learning studies the paradigm of transfer learning where source data are inaccessible for privacy preservation, while the pretrained source models are available. In [51], Liang et al. employed the mutual information maximization strategy to handle the absence of guidance from source data. Ahmed et al. [52] further extended the algorithm proposed in [51] for the case of multiple source domains. Unlike [51] and [52], Yang et al. [53] focused on utilizing the local correlation between the target samples based on the observation that target samples form clusters in the pretrained source model’s feature space. Zhang et al. [54] proposed dividing the target samples into source-like and target-specific groups and using a different adaptation strategy for each group. Concerning privacy preservation in BCIs, Ju et al. [55] proposed a privacy-preserving framework named federated transfer learning for EEG classification. In [22], Zhang and Wu proposed generating a virtual intermediate domain to reduce the distribution gap between different domains. In [23], Zhang et al. further introduced black-box settings, in which source models are available only for prediction querying, and the model parameters are inaccessible. Although the above-discussed models protect the subjects’ privacy, they cannot make online predictions for EEG samples arriving in a sequence.
Methodology
A. Problem Definition
In this paper, we consider an online privacy-preserving EEG classification scenario where the samples of the target subject arrive in an online manner. In particular, the data of the source subjects are not provided and only the source model parameters are accessible. Given n source domains (i.e., subjects)
B. Feature Extraction
A Butterworth bandpass filter is employed to filter the EEG data and obtain four frequency bands, namely the delta, theta, alpha, and beta bands (1–4, 4–8, 8–14, 14–30Hz). Subsequently, the power spectral density (PSD) of all channels is calculated for each frequency band using Welch’s approach to obtain feature representations. The reasons for using PSD as the feature are two-fold. First, PSD is widely used to represent the energy distribution of the brain. Physically, the PSD reflects the relationship between the power and frequency of the signal. Although CSP is a successful algorithm that is widely used in MI-based BCI, computing CSP filters requires collecting a batch of target subject data, which is not practical in the online scenario where no target data are available in advance. Therefore, we do not choose CSP as the feature extraction method. Second, the computation of PSD can be performed on a single EEG trial with low computational effort, making it a proper way to extract features from the online data sequence.
C. The Proposed Method
Fig. 1 illustrates an overview of the proposed method, which consists of offline and online stages. At the offline training stage, we separately learn on the source domains to obtain subject-specific classifiers
An overview of the proposed method. At the offline stage, source domain classifiers are trained using the EEG data collected from the source subjects. The locks imply that the source data are not accessible after obtaining the source classifiers. At the online stage, the target classifier for a new subject and the transferred weights of the source domains are jointly updated.
At the online learning stage, the main objective is to find a target classifier \begin{align*} & \min _{\mathbf {w}^{T}} \frac {1}{2} \Vert \mathbf {w}^{T} - \mathbf {w}_{t}^{T}\Vert ^{2} + C \xi, \\ & s.t.\quad \begin{cases} \displaystyle l(\mathbf {w}^{T},(\mathbf {x}_{t}^{T},y_{t}^{T})) \lt \xi, \\ \displaystyle \xi \gt 0. \end{cases} \tag {1}\end{align*}
\begin{equation*} \mathbf {w}_{t+1}^{T} = \mathbf {w}_{t}^{T} + \tau _{t} \mathbf {x}_{t}^{T} y_{t}^{T}, \tag {2}\end{equation*}
In the online learning scenario, we face the challenge of insufficient target data, which may lead to unsatisfactory performance of the target classifier obtained from the previous learning process. To overcome this limitation, we aim to transfer knowledge from the source domains to assist the target classifier. To avoid negative transfer and determine the most suitable source domains for the target domain, it is essential to conduct a similarity analysis between the source and target domains. However, classical measures of distribution similarity, such as KL divergence and maximum mean discrepancy, are not applicable in the online scenario of BCI application. This is because these measures require computations on a batch of data. In the online scenario, a new EEG trial arrives at each round, making it impossible to represent the complete distribution of the target domain features. To address this problem, we assume that source domains with weight vectors similar to those of the target classifier also share common information with the target domain. This shared information is beneficial for enhancing the target classification task. In order to incorporate the source classifiers differently in the prediction output, we use a weighted ensemble result of the source and target classifiers as the final prediction. Initially, the transferred weights \begin{equation*} \gamma _{i} = \Vert \mathbf {w}^{S_{i}} - \mathbf {w}_{t}^{T}\Vert ^{2}, i \in [1,n], \tag {3}\end{equation*}
\begin{equation*} u^{S_{i}} = \frac {e^{-\lambda \gamma _{i}^{2}}} {\sum _{k=1}^{n} e^{-\lambda \gamma _{k}^{2}}}, \tag {4}\end{equation*}
\begin{equation*} \tilde {y}_{t} = \sum _{i=1}^{n} u^{S_{i}} \hat {y}^{S_{i}}_{t} + \hat {y}^{T}_{t}. \tag {5}\end{equation*}
The target parameter vector \begin{equation*} \mathcal {K}(\mathbf {x}_{i},\mathbf {x}_{j}) = \phi (\mathbf {x}_{i}) \cdot \phi (\mathbf {x}_{j}), \tag {6}\end{equation*}
\begin{equation*} {\mathbf {w}}_{t+1}^{T} = \sum _{i=1}^{t} \alpha _{i}\mathbf {x}^{T}_{i}, \tag {7}\end{equation*}
\begin{equation*} {\mathbf {w}}_{t+1}^{T} \cdot \mathbf {x}_{t+1} = \sum _{i=1}^{t} \alpha _{i} (\mathbf {x}^{T}_{i} \cdot \mathbf {x}_{t+1}). \tag {8}\end{equation*}
\begin{equation*} f(\mathbf {x}^{T}_{t+1}) = \sum _{i=1}^{t} \alpha _{i} \mathcal {K}(\mathbf {x}_{i},\mathbf {x}_{t+1}), \tag {9}\end{equation*}
Simulated Online Experiment
A. Datasets
We evaluate the proposed method on two publicly available datasets, namely BCI Competition IV-2b (referred to as BCI IV-2b) [56] and Clinical Brain-Computer Interface Challenge 2020 (referred to as CBCIC) [57]. Within each dataset, we consider the data from each subject as a domain. By selecting one domain as the target domain and the remaining domains as the source domains, we construct multiple transfer learning tasks. In a practical scenario, the target domain can be a new subject or a new data session from an existing subject.
BCI IV-2b: This dataset consists of five sessions (three training sessions and two testing sessions) from nine right-handed subjects performing right-hand and left-hand imagery. Following [58], the three training sessions are used in this study. The three training sessions include a total of 400 trials. Session 1 and session 2 contain 120 trials for each without feedback, while session 3 includes 160 trials recorded with feedback. All the data comprise six channels (three EEG channels and three EOG channels) recorded at a sampling frequency of 250Hz, and only EEG channels are used in this study.
CBCIC: The EEG signals from ten hemiparetic stroke patients with no prior experience of using a BCI system are recorded with 12 electrodes at a sampling rate of 512 Hz. A total of 120 trials are available. During each trial, the subjects perform 2-class (left or right hand) motor imagery tasks.
B. Compared Methods
To demonstrate the performance of our method, we choose several methods as the comparative methods. We compare our method with the Passive Aggressive algorithm [39] since it is a classical online learning algorithm. However, PA does not exploit knowledge from the source domains, so a variant of PA, namely PAIO (PA initialized with the old classifier), is added to the comparative methods. PAIO is considered a single-source method. We also compare our method with HomOTL [40] and HomOTLMS [42]. HomOTL transfers knowledge from a single source domain via a mistake-driven Hedge algorithm. In HomOTL, the source or target classifier will receive a weight discount if it makes an erroneous prediction. HomOTLMS extends the OTL problem to the multi-source setting and uses the Hedge algorithm to learn the transferred weights of source domains. For the two single-source methods, i.e., PAIO and HomOTL, we combine all the source domains into a new source domain for offline training. Besides, we consider source-only cases that use source classifiers to make predictions for the target samples directly. Specifically, we consider two cases, denoted by SO-c and SO-s, respectively. SO-c represents the combination case, where we combine all the source domains to train a single source classifier. SO-s represents the separated case, where we train a classifier for each source domain and report the best result among them.
C. Experimental Settings
Under the online learning setting, the model learns from the data sequence and provides predictions for each received data sample. This stands in contrast to conventional approaches, where predictions are made on a different testing dataset subsequent to the batch training process. We use each PSD feature to train the model and obtain a prediction result given by the model. The prediction error rate of the model on all data from the target subject is used as the evaluation criterion. We further process the extracted PSD features using the Gaussian kernel function. Different kernel parameters, denoted as
D. Experimental Results
Tables I and II present the results of our method and the comparative methods on the BCI IV-2b and CBCIC datasets, respectively. The best result among all the methods for each target domain is highlighted in bold. The performance of each method in a specific task is evaluated by the average error rate of the 20 random permutations. To analyze the performance differences further, we conduct paired t-tests within the 20 results of each task to determine whether our method is significantly different from the comparative methods. We choose a fixed significance level of 0.05. If the p-value is lower than the fixed level, we will support that the performance difference between the two methods is significant.
On BCI IV-2b, OSFTL achieves better performance than the comparative methods in all the tasks, except for subject 07. Additionally, OSFTL outperforms all comparative methods when considering the average result across all the tasks. Furthermore, OSFTL demonstrates a comparatively lower standard deviation when compared to the second-ranking method. This observation suggests that our approach remains stable across varying orders of incoming target data. Besides, the results of the paired t-tests confirm that the performance advantage of our method is statistically significant (
On CBCIC, OSFTL outperforms all the comparative methods in all the tasks except for subject 03, and also achieves the best average result. Again, we use a paired t-test to guarantee the significance of the results, with a significance level of 0.05. The results show that our method performs significantly better than the comparative methods in all the tasks except for subjects 03 and 07. In all the tasks where OSFTL outperforms the others, as well as the average result, we observe a comparatively smaller standard deviation compared with the second-ranked method. It is worth noting that although HomOTLMS is a multi-source method, it still yields similar average results as HomOTL on CBCIC. However, OSFTL performs well under multi-source conditions. This difference in performance can be attributed to the nature of the CBCIC dataset which has a relatively small number of EEG trials. HomOTLMS employs the Hedge strategy [59] to modify the weights of the source classifiers, where a source classifier suffers a mild weight discount each time it makes a wrong prediction. Therefore, HomOTLMS needs an adequate amount of online received data to obtain reliable transferred weights. Different from HomOTLMS, OSFTL captures source classifiers that are beneficial to knowledge transfer directly through the classification parameters, thus assigning appropriate weights to them.
E. Robustness Study
In practice, it cannot be guaranteed that all source domains will benefit the target task. To verify our method’s ability to find source domains that can effectively assist the target task, we introduce noise domains to the CBCIC dataset and compare the results of our method with those of HomOTLMS. Specifically, we increase the number of noise domains from 2 to 4, generate noise samples from Gaussian noise, and randomly label them as left or right hand to form the noise domains. The noise domains contain an equal number of samples of both classes. Table III shows the error rates of subject 04 on the CBCIC dataset for both methods across the original dataset and the dataset with noise domains. Table III further shows the performance variations observed between the normal and noisy datasets. To accurately measure the change in performance, we calculate the difference in performance by
F. Online Prediction Performance
We study the behaviors of the test methods under varying numbers of training samples to analyze the ability of online transfer learning methods to adjust the model according to the incoming data. We present the results on the BCI IV-2b dataset in Fig. 2, where the error rates of most online transfer learning methods gradually decrease as the number of samples increases. This trend suggests that online transfer learning methods are well-suited for online motor imagery-based BCI scenarios. Among all the methods, our method eventually reduces the error rate to a minimum on most tasks, which aligns with the results presented in Table I. Furthermore, for most subjects, OSFTL maintains the lowest and most consistently decreasing error rate once the number of samples reaches the range of 50 to 100. These results demonstrate the effectiveness of the proposed model.
Average error rates of different methods with the increase of target domain samples on several representative subjects on the BCI IV-2b dataset.
G. Comparison With Offline Source-Free Transfer Learning Methods and Test Time Adaptation Methods
To provide a detailed analysis of OSFTL, we compare it with existing offline source-free transfer learning methods and test time adaptation methods [60]. Specifically, for offline source-free transfer learning methods, we choose Source HypOthesis Transfer (SHOT) [51] and Lightweight Source-Free Transfer (LFST) [22] as compared methods. SHOT is a method based on deep neural networks. It initializes the target domain model using the parameters of the source domain model and employs information maximization and pseudo-labeling strategies to achieve source-free transfer learning. LFST is a multi-source method. It constructs a virtual intermediate domain by selecting samples with small prediction inconsistencies from the source domain models and utilizes feature adaptation to reduce the domain discrepancy. For test time adaptation methods, we adopt Tent [61], T3A [62], and T-TIME [60] as compared methods. Tent optimizes the model’s transformation parameters in the test time through prediction entropy minimization. T3A utilizes both pseudo-labeling and prototype updating to adapt to the online data. T-TIME considers both conditional entropy minimization and label marginal distribution regularization to trim the trivial solution where all test data are classified into a single class.
Following [60], we perform Euclidean Alignment [63] before training and combine all the source data into one domain for SHOT and test time adaptation methods. We also adapt incremental Euclidean Alignment [60] for test time adaptation methods. For SHOT and test time adaptation methods, we use EEGNet [64] as their backbone. Table IV presents the error rates (%) on the CBCIC dataset. Compared with test time methods, OSFTL shows small standard deviations, indicating that it is more stable under different data permutations. Furthermore, both multi-source methods, i.e., OSFLT and LSFT, achieve better performance, demonstrating the importance of multi-source knowledge transfer. Note that the proposed OSFTL method is a shallow model that requires a smaller amount of parameters than neural network-based models.
H. Parameter Sensitivity Study
The proposed OSFTL model involves a tunable parameter
I. Ablation Study
We perform ablation studies to analyze the ensemble strategy of the proposed method. Specifically, we take the tasks on the CBCIC dataset as examples and present the results in Table V, where we denote the case of using uniform weights without updating by OSFTL-u. We observe that OSFTL outperforms OSFTL-u, which validates the effectiveness of the ensemble strategy by measuring distances between the source and target classifiers. We also perform a t-test to show that OSFTL significantly outperforms OSFTL-u on eight tasks.
J. Influence of Different Distance Measures
Since OSFTL employs a distance-based weighting strategy, it is meaningful to investigate the influence of different distance measures on the method’s performance. We study the performance of OSFTL using Euclidean distance, cosine similarity, and Chebyshev distance. Table VI presents the average results of subjects. We can observe that using the Euclidean distance yields the best results on the CBCIC dataset and comparable results with the cosine similarity on the BCI IV-2b dataset. The results of cosine similarity may be attributed to the fact that cosine similarity only considers the direction of vectors and is insensitive to the magnitude of vectors. Chebyshev distance may not be appropriate for capturing the domain discrepancy, resulting in its less satisfactory performance.
K. Influence of Euclidean Alignment
We analyze the influence of Euclidean Alignment (EA) [63] on OSFTL and present the average results in Table VII. EA is a transfer learning method specially designed for BCIs. EA aligns the covariance matrices of all EEG trials from a subject such that the arithmetic mean of all covariance matrices is equal to the identity matrix. Many cross-subject methods for BCIs employ EA before conducting subsequent machine learning, such as [19], [22], and [60]. Following [60], we apply EA and Incremental EA (IEA) to OSFTL. IEA is a variant of EA designed for online scenarios. EA uses all available trials of a subject as its reference matrix. IEA utilizes an incremental average strategy and uses the average of currently received covariance matrices to replace the original reference matrix in EA. Table VII shows that OSFTL may not be very beneficial from EA and IEA.
L. Interaction of Different Individuals’ Data
Typically, EEG data are collected from multiple subjects, resulting in multiple source domains for transfer learning. However, combining the source domains can incorporate interactions among individual data, possibly improving performance. Here, we investigate whether combining the sources, instead of independent training, could improve the performance of OSFTL. Specifically, we combine every three source domains into a new source domain for each target. Table VIII indicates that the combination case experiences performance degradation. According to [45], noise or negative knowledge generated by one source domain can effectively be corrected by other source domains. Combining data from different individuals reduces the number of independent source domains, potentially weakening the ability to correct negative knowledge.
Real-World Online Experiment
A. Experimental Setup
Six subjects, denoted as S01, S02,
Prior to the online experiment, we collect motor imagery data from all the subjects, which serve as the source domains. During the online experiment, each subject is treated as a new target subject. All data, except the current target subject’s data, are treated as the source domains to assist in the new task. Each source domain contains 100 trials of EEG signals which consist of 50 trials of left-hand imagery tasks and 50 trials of right-hand imagery tasks. Each trial begins with a preparation stage for 3s with a fixation cross presented in the middle of the screen. Then, a prompt appears on the screen for 4s, indicating that the subject should imagine left-handed or right-handed movement, during which the subject continues to perform the corresponding imagery task. Subsequently, before the next trial starts, there is a pause stage of 2–3s. Fig. 4 shows the timing diagram described above.
In the online experiment, the subjects are instructed to control the movement of an object on the screen by performing the corresponding motor imagery according to the given cue. Each round of the experiment consists of an imagery period and a rest period. At the beginning of the imagery period, an image of a soccer ball is placed in the center of the screen. Colored squares representing goals are placed on both sides of the screen, and text at the top of the screen indicates the specific imagery task the subject should perform. The subject’s task is to imagine the movement according to the given cue. As the subject performs a movement attempt, the online decoder provides a real-time prediction of the subject’s imagery. Based on these predictions, the soccer ball image moves a certain distance in the corresponding direction. It takes four moves for the ball to reach either side of the screen from its initial position. The duration of the imagery period is uncertain and varied depending on how much time the subject takes to move the ball to the goal. Consequently, the length of the online data sequence within an imagery period is also uncertain. This setup reflects the practical application scenario of BCI, where the duration of the task depends on the user’s performance. After the soccer ball moves to either side, the imagery period ends regardless of whether it enters the correct goal. This allows the model to receive class-balanced training samples. After a rest period of 4 seconds, the next round begins, and the cue instructs the subjects to perform the imagery tasks by moving the ball in the opposite direction to which it had gone in the previous round.
B. Data Acquisition
EEG data are acquired from 32 scalp sites (extended 10-20 system) using a cap with active Ag/AgCl electrodes (Quickcap32). Wet electrodes are used in the cap, and the electrode impedance is modulated to less than
To address the problem of handling data sequences of varying lengths as mentioned in the previous subsection, we implement a sliding window approach. The window has a fixed length of 4 seconds, which corresponds to 4000 sample points for each electrode. Every 1s, newly received sample points from each electrode will be added to the window while the oldest 1000 sample points are discarded. The data in the window are then served as the model input. This strategy allows the model to work on continuous data series and increases the number of available training samples. Once the model receives a total of 200 training samples from the window, the online experiment ends with the error rate during this period calculated for evaluating its performance. In order to adapt the subjects to the online experiment, we provide two sessions for each subject. This allows the subjects to become familiar with the experimental setup and the tasks involved.
C. Experimental Results
Table IX presents the error rate results of all six subjects in both sessions. Our method records an average error rate of
Error rates of the proposed method with the increase of target domain samples on six subjects.
Discussion
As shown in Fig. 2, at the beginning of the online learning stage, the performance improvement of some tasks is evident and the performance improvement of other tasks is not. We interpret this observation as follows. For some tasks, it is difficult for online learning methods to optimize under the data sequence, possibly due to the instability of online learning or the inadequate number of samples. Thus, the performance of these methods may not change a lot during the learning process. However, it is still possible to improve the performance through knowledge transfer. The improvement brought by knowledge transfer can be observed in the initial phase because the learned source classifiers could be beneficial for making correct predictions. Tables I and II also show that OSFTL has performance improvement compared with SO-c and SO-s, indicating that the online learning process is beneficial. For the self-collected dataset, it may be more challenging to train an ideal target classifier due to the subjects’ lack of experience in MI and the difficulty of the online task.
During the online learning process, the target classifier may give a completely erroneous prediction to the incoming sample, which may lead to a wrong ensemble result. However, the more significant the discrepancy between the prediction and the true label is, the greater the loss incurred by the model according to the hinge loss function. Thus, subsequent model updates are able to rectify an incorrect prediction.
Conclusion
In this paper, we consider both the privacy-preserving problem and the cross-subject problem for online BCI applications. To utilize the information in existing auxiliary data, we build offline classifiers for each source subject. Out of concern for privacy preservation, classifiers trained on source subjects’ data are used to improve the performance of the target task instead of using the data directly. Subsequently, we trained a target classifier on the EEG data sequence from the target subject. At the online stage, we conduct online knowledge transfer by combining the source and target classifier using a weighting strategy. In each round, we dynamically update the weights based on the distance of classifier vectors to reflect the transferability for each source subject in an online fashion. Experimental results demonstrate the effectiveness of our method.
Currently, OSFTL has some limitations, primarily in terms of feature extraction methods and BCI paradigms. In the future, to expand the usability of our method, different feature extraction methods can be considered. It is also worthwhile to investigate the usability of our method on other EEG paradigms, such as P300 and SSVEP.