Multi-Stimulus Least-Squares Transformation With Online Adaptation Scheme to Reduce Calibration Effort for SSVEP-Based BCIs

Steady-state visual evoked potential (SSVEP), one of the most popular electroencephalography (EEG)-based brain-computer interface (BCI) paradigms, can achieve high performance using calibration-based recognition algorithms. As calibration-based recognition algorithms are time-consuming to collect calibration data, the least-squares transformation (LST) has been used to reduce the calibration effort for SSVEP-based BCI. However, the transformation matrices constructed by current LST methods are not precise enough, resulting in large differences between the transformed data and the real data of the target subject. This ultimately leads to the constructed spatial filters and reference templates not being effective enough. To address these issues, this paper proposes multi-stimulus LST with online adaptation scheme (ms-LST-OA). Methods: The proposed ms-LST-OA consists of two parts. Firstly, to improve the precision of the transformation matrices, we propose the multi-stimulus LST (ms-LST) using cross-stimulus learning scheme as the cross-subject data transformation method. The ms-LST uses the data from neighboring stimuli to construct a higher precision transformation matrix for each stimulus to reduce the differences between transformed data and real data. Secondly, to further optimize the constructed spatial filters and reference templates, we use an online adaptation scheme to learn more features of the EEG signals of the target subject through an iterative process trial-by-trial. Results: ms-LST-OA performance was measured for three datasets (Benchmark Dataset, BETA Dataset, and UCSD Dataset). Using few calibration data, the ITR of ms-LST-OA achieved <inline-formula> <tex-math notation="LaTeX">$210.01{\pm }10.10$ </tex-math></inline-formula> bits/min, <inline-formula> <tex-math notation="LaTeX">$172.31{\pm }7.26$ </tex-math></inline-formula> bits/min, and <inline-formula> <tex-math notation="LaTeX">$139.04{\pm }14.90$ </tex-math></inline-formula> bits/min for all three datasets, respectively. Conclusion: Using ms-LST-OA can reduce calibration effort for SSVEP-based BCIs.


I. INTRODUCTION
E LECTROENCEPHALOGRAPHY (EEG)-based brain- computer interfaces (BCIs) provide people with disabilities a new approach to interact with the outside world that does not rely on neural and muscular pathways, but rather by decoding the brain activities [1], [2].Steady-state visual evoked potential (SSVEP) is one of the most popular EEG-based BCI paradigms due to its high information transfer rate (ITR) and signal-to-noise ratio (SNR) [3], [4].In practice, SSVEP-based BCI indicates different commands by displaying visual stimuli flashing at different frequencies on a monitor.The selected command is recognized by detecting the specific frequency component of the EEG signal [5].In the past decades, SSVEP has been proven to have great application potential, e.g., speller [6], [7], disability assistance [8], [9], smart home [10], [11], and gaming [12], [13], etc.
In previous studies, calibration-based recognition algorithms have received significant attention.With the continuous development of recognition algorithms, more and more excellent calibration-based recognition algorithms have been proposed, e.g., extended canonical correlation analysis (eCCA) [14], ensemble task-related component analysis (eTRCA) [15], and task-discriminant component analysis (TDCA) [16], etc., all of which have achieved remarkable performance.But calibration-based recognition algorithms require a large amount of calibration effort, which results in a significant amount of time consumption and user fatigue [17].
Transfer learning is used as a common solution to reduce calibration effort, in which cross-subject is a popular scenario for transfer learning [18].Least-squares transformation (LST), a cross-subject transfer learning approach, uses a small amount of calibration data from the target subject to reduce calibration effort by transforming existing data from the source subjects to calibration data for the target subject [19].Small data LST (sd-LST), a variant of LST, uses less calibration data than LST.Unlike LST, sd-LST constructs a common transformation matrix for all stimuli for each source subject.The data from all stimuli are transformed using these common transformation matrices [20].Although these LST methods mentioned above reduce a lot of calibration effort and achieve high performance, there are still some problems.The transformation matrices constructed by current LST methods are not precise enough.

TABLE I THE MAIN DIFFERENCES OF LST, SD-LST AND MS-LST
For LST, the effect of noise leads to imprecisely constructed transformation matrices when the calibration data are small.For sd-LST, the constructed common transformation matrix may not be the most appropriate for each stimulus due to differences between different stimuli.This results in differences between the transformed data and the real data, ultimately leading to the constructed spatial filters and reference templates not being effective enough.
To solve these problems, we propose the multi-stimulus LST with online adaptation scheme (ms-LST-OA) to reduce calibration effort for SSVEP-based BCIs.This approach successfully addresses the problems mentioned above: 1) [21] shows that the SSVEP impulse responses elicited by stimuli of neighboring frequencies are similar.Thus, the cross-stimulus learning scheme using data from neighboring stimuli can improve the transformation matrices constructed by LST and the ability against insufficient calibration [22].So, we can use the cross-stimulus learning scheme to improve the current LST, called ms-LST.Different from LST and sd-LST, ms-LST not only uses the target stimulus, but also utilizes data from neighboring stimuli (the stimuli which flash at the frequencies nearby that of the target stimulus) to construct a more precise transform matrix for each stimulus.Each transformation matrix is only applied to stimuli of specific frequency.This improvement can result in better transformed data.The differences of LST, sd-LST and ms-LST are displayed in Table I.
2) Previous studies have suggested that an online adaptation scheme can improve BCI performance [23], [24].In this paper, we modify the recognition algorithm to online learning mode to recognize EEG signals, hence the method is named ms-LST-OA.Using the online learning mode, spatial filters and reference templates are continuously optimized by learning more EEG signal features of the target object each time an EEG signal is recognized.The experiment results show that ms-LST-OA can reduce the calibration effort for SSVEP-based BCI and achieve high ITR.
The remainder of this paper is organized as follows: Section II introduces some preliminaries of SSVEP-based BCIs.Section III presents the ms-LST-OA.Section IV describes the details of the experiments, and the results are presented in section V. Finally, Section VI concludes this paper.

II. PRELIMINARIES
This section introduces some approaches, including CCA-based approaches, TRCA-based approaches, and LST.

A. CCA-Based Approaches
CCA is a popular calibration-free recognition algorithm for SSVEP-based BCIs, first introduced to SSVEP recognition by Lin et al. [25].For the unlabeled EEG signal X ∈ R N ch ×N s and the reference template Y k ∈ R 2N h ×N s , the CCA calculates the correlation coefficient between them to extract the frequency component of the EEG signal.Where N ch is the number of EEG channels, N h is the number of harmonics (N h = 5 in this paper), N s is the number of sampling points, Y k is a sine-cosine signal of the k-th stimulus, i.e., where t = 1/Fs,2/Fs, • • • , N p /Fs , Fs is the sampling rate.
f k and ϕ k are the frequency and phase of the k-th stimulus, respectively.
To compute the correlation coefficient between X and Y k , CCA finds two weight vectors w x ∈ R N ch ×1 and w Y k ∈ R 2N h ×1 by solving the following problem: After that, the correlation coefficient between the EEG signal X and the reference template Y k can be calculated as: Finally, the frequency of the EEG signal f t can be determined by the following: where N f is the number of stimuli.Chen et al. [26] proposed FBCCA as an extension of CCA to improve the detection of EEG signal.It uses a filter bank to decompose the EEG signal X into N b subbands (X 1 , • • • , X N b ) with different passbands.Then, the correlation coefficient r i k between the sub-band component X i and the reference template Y k can be calculated by CCA.After that, the weighted square sum is applied to the correlation coefficients of all sub-bands to calculate the final correlation coefficient r k between the EEG signal X and the reference template Y k :

B. TRCA-Based Approaches
TRCA is a method to extract task-relevant components by maximizing reproducibility during the task periods, originally proposed by Tanaka et al. [27] and later introduced by Nakanishi et al. [15] for SSVEP-based BCIs.For the k-th stimulus, TRCA extracts spatial filters w k ∈ R N ch ×1 by maximizing the inter-trial covariance to eliminate the influence of task-unrelated signal as much as possible, i.e., where S k ∈ R N ch ×N ch denotes the sum of inter-trial covariance of the calibration data for the k-th stimulus, N cali is the number of calibration trials for each stimulus, X k,i denotes the i-th trial of the k-th stimulus.
In order to have a finite solution to (6), the variable Q k ∈ R N ch ×N ch is defined as: where k = X k,1 , • • • , X k,N cali is the concatenated matrix of all calibration data for the k-th stimulus.
Then the above problem can be solved by the following equation: After that, TRCA calculates the correlation coefficient based on the EEG signal X and the subject-specific averaged reference template where Furthermore, Nakanishi et al. [15] proposed ensemble TRCA as an extension of TRCA to further improve the performance by integrating all spatial filters, i.e., After that ( 9) is modified as follows: Finally, the frequency of the EEG signal f t can be determined by (4).

C. Least-Squares Transformation
Chiang et al. [19] proposed LST used for SSVEP cross-subject transfer learning to reduce calibration effort for the target subject.
LST utilizes least squares regression to construct a transformation matrix scr j P k,i ∈ R N ch ×N ch for the target subject as follow: where tar X k,i ∈ R N ch ×N s and scr j X k,i ∈ R N ch ×N s denote the calibration data for the i-th trial of the k-th stimulus for the target subject and the j-th source subject, respectively.tar X k ∈ Fig. 1.
The overall framework of multi-stimulus LST with online adaptation scheme (ms-LST-OA).
R N ch ×N s denotes the averaged calibration data for the k-th stimulus of the target subject calculated as: tar N cali is the number of calibration trials for each stimulus of the target subject.
After that, the calibration data of the source subject can be transformed into the calibration data of the target subject by scr j X ′ k,i = scr j P k,i scr j X k,i .Finally, the calibration dataset of the target subject tar D ′ consists of the original calibration data of the target subject and the transformed calibration data of other source subjects, i.e., where N sub is the number of the source subjects, scr j N cali is the number of calibration trials for each stimulus of the j-th source subject, ⊕ indicates data merging operation.

III. METHODS
In this study, we propose the ms-LST-OA to reduce the calibration effort for SSVEP-based BCI, as displayed in Fig. 1.This approach consists of two parts: ms-LST and online adaptation scheme.

A. Multi-Stimulus LST
The SSVEPs of subjects elicited by stimuli of neighboring frequencies have similar impulse responses that can assist LST in constructing transformation matrices with higher precision [21].Furthermore, cross-stimulus learning scheme using neighboring stimuli can also improve the ability against Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.insufficient calibration [22].Therefore, the proposed ms-LST utilizes the data from neighboring stimuli to construct a transformation matrix for each stimulus, which can improve the precision of the transformation matrix.The overall framework of ms-LST is shown in Fig. 2.
To transform the calibration data of the j-th source subject for the target subject, we need to sort the dataset of the target subject tar D and the dataset of the j-th source subjects scr j D according to the frequency of the stimuli: where a i ∈ 1, N f is stimulus index and these indexes need to satisfy: As in [20], the target subject can use data from only K stimuli for calibration with K ⩽ N f .The K stimuli need to be uniformly distributed over N f frequencies as: Then, we can average the data of these K stimuli according to get K subject-specific templates, while the other N f − K templates are filled with zero matrices.Unlike LST, the ms-LST utilizes data not only from the target stimulus (which flashes at f a k ) but also from neighboring stimuli (which flash at the frequencies nearby f a k ) to construct transformation matrix for the target stimulus by solving the following problem: where tar χ a k and scr j χ a k denote the subject-specific templates for the stimulus of frequency f a k and its neighboring stimuli for the target subject and the j-th source subject, respectively, which can be calculated as: tar X a k and scr j X a k denote the subject-specific templates of frequency f a k for the target subject and the j-th source subject, respectively, scr j X a k can be calculated as follows: As in [22], the range from p to q is the range of neighboring stimuli s for the stimulus of frequency f a k , and s = p + q + 1.
To ensure that at least one template falls within the range of neighboring stimuli, s ⩾ N f /K .Setting the parameter s 0 = s/2 when s is even and s 0 = (s − 1) /2 when s is odd, the values of p and q can be determined as: Then the transformation matrix scr j P a k for the a k -th stimulus of the j-th source subject can be calculated as follows: Finally, the transformation matrix is applied to each trial data for the a k -th stimulus of the j-th subject: After the transformation is completed for all trials for all subjects, the final calibration data for the target subject tar D ′ is expanded as: We used the final calibration data tar D ′ to calibrate the eTRCA recognition algorithm.Since previous studies have proven that the ensemble classifier has higher classification performance, and the sine-cosine templates also contribute to SSVEP signal recognition [14].Therefore, we use the ensemble classifier of eTRCA and FBCCA to determine the classification results by combining the correlation coefficients calculated by eTRCA and FBCCA: where r k,1 and r k,2 denote the correlation coefficient of FBCCA and eTRCA for the k-th stimulus, respectively.After calculating the correlation coefficient r k , the frequency of stimuli is determined according to (4).

B. Online Adaptation Scheme
To further optimize the spatial filters and reference templates constructed by eTRCA using tar D ′ , as in [24] and [23], this paper uses an online adaptation scheme to modify the eTRCA to online learning mode.The overall framework of the online adaptation scheme is shown in Fig. 3.
We set the state after calibrating the recognition algorithm using transformed data by ms-LST to the 0-th recognition.Suppose that the test data X is recognized as the k-th stimulus in the c-th recognition.So, the reference template and spatial filter for the k-th stimulus need to be updated.
Firstly, the reference template X k [c] and the number of calibration data N [c]  cali,k for the k-th stimulus are updated: where N [c]  cali,k is the number of calibration trials for k-th stimulus at the c-th recognition, X k [c] denotes the subject-specific averaged reference template for the k-th stimulus at the c-th recognition.
Then, according to (8), in order to update the eTRCA-based spatial filter, the values of the covariance matrices S [c]  k and Q [c]  k at the c-th recognition need to be updated.According to (6), for the test data X , S [c]  k can be updated as: where S [c]  k denotes the covariance matrix of the k-th stimulus S k at the c-th recognition.
According to (7), Q [c]  k can be calculated as: where [c]  k denotes the set of SSVEP data for the k-th stimulus at the c-th recognition defined by. [c] To facilitate the use of X k [c] to update Q [c]  k , the variable Z [c]  k is defined as: For the test data X , Z [c]  k can be updated as: Thus, Q [c]  k can be updated as: where , 2 function represents the mean value on the matrix X k [c] in the dimension of the sampling points and results in a vector of R N ch ×1 .Finally, we can update the spatial filter w [c]  k : In this way, the c-th update of the spatial filter and the reference template is completed.
Additionally, in order to reduce the effect of noise and improve the performance of the online adaptation scheme, the data from source subjects is averaged across trials before transforming the data using ms-LST.

IV. EXPERIMENTS
To evaluate the performance of ms-LST-OA, we conduct experiments for three datasets using the leave-one-out crossvalidation approach, i.e., one subject is used as the target subject while the others are used as source subjects.Furthermore, part of the data of the target subject was used as calibration data and the other part as test data.

A. SSVEP Datasets
A total of three SSVEP datasets are used in this paper for method performance testing: 1) Dataset I: Benchmark Dataset was presented by Wang et al. [28].This dataset contains EEG data from 35 healthy subjects.For each subject, the data contains 6 blocks.Each block contains 40 trials corresponding to 40 stimuli from 8.0 Hz to 15.8 Hz with an interval of 0.2 Hz.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE II THE PARTIAL PARAMETERS OF THESE THREE DATASETS
2) Dataset II: BETA Dataset was presented by Liu et al. [29].This dataset contains EEG data from 70 healthy subjects.For each subject, the data contains 4 blocks.Each block contains 40 trials corresponding to 40 stimuli from 8.0 Hz to 15.8 Hz with an interval of 0.2 Hz. 3) Dataset III: UCSD Dataset was presented by Nakanishi et al. [30].This dataset contains EEG data from 10 healthy subjects.For each subject, the data contains 15 blocks.Each block contains 12 trials corresponding to 12 stimuli from 9.25 Hz to 14.75 Hz with an interval of 0.5 Hz.The partial parameters of these three datasets are presented in Table II, which denotes the number of channels, the number of stimuli, the number of trials for each stimulus, and the number of subjects in the dataset, respectively.

B. Data Preprocessing
Considering the latency in SSVEP response, we used data segment [T l , T l + T w ] after stimulus onset for the analysis, where T l is the SSVEP latency (T l = 0.14 s in this paper) and T w denotes time windows (T w ∈ {0.4 s, 0.5 s, • • • , 1.0 s}).In addition, for Dataset I and II, nine electrodes (Pz, PO5, PO3, POz, PO4, PO6, O1, Oz, and O2) were used for analysis.For Dataset III, all electrodes were selected for data analysis.After data formatting, the EEG data for each trial was removed from the power-line noise using a 50 Hz notch filter.Finally, the EEG data for each trial was decomposed into N b = 5 subbands using different bandpass filters, where the upper and lower cut-off frequencies of the i-th sub-band were set to (i × 8 − 2)Hz and 90 Hz, respectively.

C. Performance Evaluation
We use the averaged classification accuracy (ACC) and averaged ITR to measure the performance of the involved methods.In particular, the ITR is calculated as: where P is the ACC, T denotes the time required to complete a recognition, and T = T w + 0.5s, 0.5s is the gaze shifting time.

D. Data Analysis
The following experiments were conducted to test the performance of ms-LST-OA.All LST-based methods use the final calibration data tar D ′ to train the ensemble classifier of eTRCA and FBCCA to recognize SSVEP data based on (29).
1) Parameter Exploration of ms-LST: According to (25) there are many parameters that affect the performance of ms-LST, such as the range of neighboring stimuli s, the number of data channels N ch , the time window T w , the amount of calibration data tar N cali , and so on.As the LST-based methods are used to reduce the calibration effort, only a very small amount of calibration data was used in the experiment ( tar N cali = 1 and K = N f ).So, this section explores the influence of other parameters for the performance of ms-LST (s, N ch , and T w ).Where, since Dataset III has only 8 channels, the experiment was conducted only for Dataset I and II when exploring the effect of the number of data channels.
2) Effectiveness of ms-LST-OA: To verify the effectiveness of ms-LST-OA, the following experiments are conducted in this section: 1) To demonstrate that the cross-stimulus learning scheme can improve the performance of LST, ms-LST is compared with LST and sd-LST.2) To prove that the online adaptation scheme can further optimize the spatial filters and reference templates, ms-LST-OA is compared with ms-LST.All experiments were conducted at tar N cali = 1 and K ∈ 1, N f .
3) Ablation Experiment: Compared to the LST method, the proposed ms-LST-OA adopts cross-stimulus learning scheme that uses data from neighboring stimuli to construct the transformation matrices and utilizes an online adaptation scheme to optimize spatial filters and reference templates.So, the ablation experiment was conducted to explore the effects of these improvements.In the experiment, only one calibration data was used per stimulus, i.e., tar N cali = 1 and K = N f .

4) The Ability of ms-LST-OA to Reduce Calibration Effort:
In this section, to explore the ability of ms-LST-OA to reduce calibration effort, only a small amount of data is used for calibration.The calibration-based approach uses as much data as possible and makes its ITR comparable to ms-LST-OA.The ability of ms-LST-OA to reduce the calibration effort is reflected by comparing the ITR and the number of calibration data at different time windows T w ∈ {0.4s,0.5s,• • • , 1.0s} .Among the approaches that participate in the comparison besides ms-LST-OA are OACCA [24], eCCA [14], eTRCA [15], TDCA [16], stCCA [31] and sd-LST [20].Where OACCA is the stateof-the-art calibration-free algorithm, eCCA and eTRCA are the classical calibration-based algorithms, TDCA is the stateof-the-art calibration-based algorithm, and stCCA and sd-LST are the state-of-the-art transfer learning methods.

A. Parameter Exploration of ms-LST
The ITR of ms-LST for Dataset I, II, and III at the number of templates K = N f with different ranges of neighboring stimuli s are shown in Fig. 4. The ITR peaked at s = 11 (213.15± 10.86 bits/min), s = 14 (181.33 ± 7.51 bits/min), and s = 7 (137.20 ± 13.45 bits/min) for the three datasets, respectively.In addition, at N f = 40 and K = N f , we averaged the ITR of ms-LST for Dataset I and II at each s and obtained the optimal s = 11 (which maximizes the ITR), ITR = 197.05± 9.23 bits/min.At N f = 12 and K = N f , the optimal s is 7 according to the results of Dataset III.To get a more general scenario, we repeated the experiment at different numbers of templates K and ranges of neighboring stimuli s.Then, we obtained the optimal s at different K for N f = 40 and N f = 12, respectively.After that, polynomial fitting was used to fit the optimal s and K .Finally, the fitting result showed that the optimal s can be obtained according to for Dataset I and II, and according to for Dataset III.
The averaged ITR across subjects of ms-LST for Dataset I and II at different numbers of channels is shown in Fig. 5.In this part of the experiment, the experiment was performed only for Dataset I and II because Dataset III has only 8 channels.In the experiment, we measured the ITR of the ms-LST method when only the SSVEP-related channels were used (when the number of channels is 8 and 9) and when the SSVEP-unrelated channels were used (when the number of channels is greater than 9).When only SSVEP-related channels were utilized, the average ITR was maximized with all SSVEP-related channels being used (213.15± 10.86 bits/min for Dataset I and 180.94 ± 7.60 bits/min for Dataset II at N ch = 9).When a small number of SSVEP-unrelated channels were used, the ITR was not far from the ITR when all SSVEP-related channels were used (213.60 ± 10.86 bits/min for Dataset I and 180.30 ± 7.64 bits/min for Dataset II at N ch = 12, only a very slight difference from N ch = 9).However, as the number of SSVEP-unrelated channels continued to increase, the ITR decreased (204.80 ± 11.74 bits/min for Dataset I and 156.83 ± 8.64 bits/min for Dataset II at N ch = 64).There is a large decrease compared to that at N ch = 9.Therefore, utilizing the SSVEP-unrelated channels was not helpful for ms-LST, so only all SSVEP-related channels were used in the following experiments.
Fig. 6 illustrates the performance of the ms-LST method under different time windows.Generally, high ITR can be achieved by getting a higher ACC in a shorter time.For Dataset I and II, the ms-LST method reaches the highest ITR at 0.6s (213.15 ± 10.86 bits/min and 180.94 ± 7.60 bits/min), and at 0.5s for Dataset III (137.85 ± 15.82 bits/min).This shows that shorter time windows (e.g., 0.6s or 0.7s) can be chosen to get a higher performance.

B. Effectiveness of ms-LST-OA
To demonstrate that the cross-stimulus learning scheme can improve the performance of LST, we compared the ITR of ms-LST with that of LST and sd-LST at T w = 0.6s with different K .The results of the comparison are displayed in Fig. 7.For all three datasets, ms-LST was able to obtain an ITR that exceeded that of LST using data from only a small number of templates.Furthermore, when the number of templates was small, the ITR of ms-LST was comparable to that of sd-LST.But as the number of templates increased, the ITR of sd-LST increased only slightly, whereas that of ms-LST increased more significantly.In addition, we calculated significant differences between the ITR of ms-LST and sd-LST at different templates numbers using paired t−test and corrected the t−test result using the Bonferroni method.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.The results show that the difference between the performance of ms-LST significantly outperforms sd-LST as K increases for all three datasets (for Dataset I, p < 0.05 when K ⩾ 10, for Dataset II, p < 0.05 when K ⩾ 13 and p < 0.05 when K ⩾ 10 for Dataset III).Thus, as the number of templates increases, ms-LST can achieve a higher ITR than sd-LST, and the performance difference becomes increasingly significant.These results suggest that the cross-stimulus learning scheme can greatly improve LST performance.
To prove that the online adaptation scheme can further optimize the spatial filters and reference templates, we compared the ITR of ms-LST-OA and ms-LST at T w = 0.6s with different K .The results of comparison are displayed in Fig. 8.For all three datasets, ITR was higher for ms-LST-OA than for ms-LST at different numbers of templates.Furthermore, we also calculated the significant difference between ms-LST-OA and ms-LST at different numbers of templates using paired t−test and corrected the t−test result using the Bonferroni method.For Dataset I and II, the ITR of ms-LST-OA was significantly higher than that of ms-LST for all K ∈ [1, 40] ( p < 0.001).For Dataset III, the performance  difference between ms-LST-OA and ms-LST decreased as K increased ( p < 0.01 at K = 1, p < 0.05 at K = 6, and when K = 12, there is no significant difference).Fig. 9 illustrates the averaged performance changes of the ms-LST-OA with the number of trials at K = 5.The result shows that the performance difference between ms-LST-OA and ms-LST became gradually larger.Therefore, these experiment results demonstrate that the online adaptive scheme can further optimize the spatial filters and reference templates constructed from ms-LST transformed data to improve performance.
Consequently, the ms-LST-OA, which uses cross-stimulus learning scheme and online adaptation scheme, can achieve higher performance than the current LST method.

C. Ablation Experiment
In the ablation experiment, we compared the performance of LST, ms-LST, LST with online adaptation scheme (LST-OA), and ms-LST-OA at K = 40 with different time windows.The results are displayed in Fig. 10.These methods achieve  Where blue (red) color indicates that the ITR of ms-LST-OA is lower (higher) than that of the compared method.And the color depth indicates the level of significant difference between ms-LST-OA and the compared method, with the darker color indicating the more significant difference.
the highest averaged ITR at different T w (LST: 187.65 ± 9.73 bits/min at 0.8s; ms-LST: 213.15 ± 10.86 bits/min at 0.6s; LST-OA: 197.44 ± 9.30 bits/min at 0.8s; ms-LST-OA: 218.33 ± 11.23 bits/min at 0.6s).Furthermore, the significant differences were calculated using paired t−test and corrected by the Bonferroni method (between LST and ms-LST, LST and LST-OA, ms-LST and LST-OA).Firstly, the averaged ACC and ITR of ms-LST and LST-OA were significantly higher than those of LST for all three datasets ( p < 0.001 for all three datasets).The results show that the performance of the LST method can be improved using both the cross-stimulus learning scheme and the online adaptation scheme.Secondly, ms-LST had a higher averaged ACC and ITR than LST-OA for all three datasets ( p < 0.001 at most time windows for Dataset I and II, p < 0.05 at time windows from 0.4s to 0.6s for Dataset III, and no significant difference at other time windows).This demonstrates that using the cross-stimulus learning scheme leads to a larger performance improvement than using the online adaptation scheme.Moreover, ms-LST-OA uses online adaptation scheme on the base of ms-LST to gradually optimize the spatial filters and reference templates, which is able to enhance ITR further.

D. The Ability of ms-LST-OA to Reduce Calibration Effort
To measure the ability of ms-LST-OA to reduce the calibration effort, the experiment was conducted to compare the performance of ms-LST-OA with other methods.The averaged ITR across subjects for the different methods are displayed in Fig. 11.Table III shows the highest averaged ITR of the different approaches.It can be found that: 1) In contrast to OACCA, a calibration-free online adaptive method, ms-LST-OA required only a small amount of calibration data to achieve a higher ITR. 2) Comparing eCCA, eTRCA, and TDCA, which are calibration-based methods that require a large amount of calibration data, ms-LST-OA achieved high performance for all three datasets using only a smaller amount of calibration data.3) Compared to stCCA, sd-LST, which are transfer learning methods, ms-LST-OA achieved a higher ITR for all three datasets using the same amount of data.
Fig. 12 shows the paired t−test results of ms-LST-OA and other methods corrected by the Bonferroni method.It can be known that: 1) ms-LST-OA significantly outperformed Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
OACCA for the three datasets ( p < 0.001).2) For all three datasets, ms-LST-OA used less calibration data, but in most cases, there were no significant differences compared to eCCA, eTRCA, and TDCA, which use a large amount of data.This demonstrates ability of ms-LST-OA to reduce a large calibration effort.3) For the three datasets, ms-LST-OA was superior to stCCA ( p < 0.001 for Dataset I and II as well as p < 0.05 for Dataset III in most cases).And ms-LST-OA performed better than sd-LST for Dataset I and II, whereas there was essentially no significant difference for Dataset III ( p < 0.01 mostly for Dataset I and p < 0.05 mostly for Dataset II).This indicates that ms-LST-OA is an excellent method for transfer learning.
All results indicate that ms-LST-OA can significantly reduce calibration effort and achieves higher ITR.

VI. CONCLUSION
To reduce the calibration effort for high-speed SSVEPbased BCIs, this study proposes the ms-LST-OA, which uses cross-stimulus learning scheme and online adaptation scheme to make improvements to the LST.Our experiment results indicate that ms-LST-OA requires only a small amount of calibration data to achieve comparable performance to calibration-based recognition algorithms.This means that ms-LST-OA can reduce a large amount of calibration and achieve a high ITR, which is helpful for the application of SSVEPbased BCI.
Furthermore, the proposed method could be improved.Due to the restricted specifications of the public datasets, more online experiments are needed to determine the relationship between the range of neighboring stimuli (s) and the number of templates (K ) at different numbers of stimuli (N f ).Future research will focus on improving the applicability of the proposed method.

Fig. 3 .
Fig. 3.The overall framework of the online adaptation scheme.

Fig. 4 .
Fig. 4. The ITR of ms-LST-OA was tested for Dataset I (a), II (b) and III (c) under different ranges of neighboring frequencies at K = N f .In the plots, the curves indicate the average ITR of ms-LST across subjects, and the shaded areas indicate standard errors.

Fig. 5 .
Fig. 5. Averaged ITR across subjects for Dataset I (a) and II (b) for ms-LST at different number of channels.Error bars indicate standard errors.

Fig. 6 .
Fig. 6.Averaged ACC and ITR across subjects for ms-LST for Dataset I (a), II (b) and III (c).Error bars indicate standard errors.

Fig. 7 .
Fig. 7.The averaged ITR of the ms-LST compared to LST and sd-LST at different numbers of templates (K) for Dataset I (a), II (b) and III (c).Shaded areas indicate standard errors.Significant differences between ms-LST and sd-LST are also calculated using paired t−test and the results are corrected using the Bonferroni method.Asterisks indicate significant difference between ms-LST and sd-LST ( * : p < 0.05, * * : p < 0.01, * * * : p < 0.001).

Fig. 8 .
Fig. 8.The averaged ITR of ms-LST-OA compared to ms-LST at different numbers of templates (K) for Dataset I (a), II (b) and III (c).Shaded areas indicate standard errors.Significant differences are also calculated using paired t−test and the results are corrected using the Bonferroni method.Asterisks indicate significant difference between ms-LST-OA and ms-LST ( * : p < 0.05, * * : p < 0.01, * * * : p < 0.001).

Fig. 9 .
Fig. 9.The averaged performance changes of the ms-LST-OA with the number of trials at K=5 for Dataset I (a), II (b) and III (c).The shaded areas indicate the standard error across subjects.

Fig. 10 .
Fig. 10.Results of ablation experiment for Dataset I (a), II (b) and III (c).Error bars indicate standard errors.Significant differences are also calculated using paired t−test and the paired t−test results are corrected using the Bonferroni method.Asterisks indicate significant difference ( * : p < 0.05, * * : p < 0.01, * * * : p < 0.001).Blue, grey and pink indicate the significant differences between LST and ms-LST, LST and LST with online adaptation scheme (LST-OA), ms-LST and LST-OA, respectively.

Fig. 11 .
Fig. 11.The averaged ITR across subjects of the different approaches for Dataset I (a), II (b) and III (c) at different time windows.Error bars indicate standard errors.

Fig. 12 .
Fig.12.The paired t− test results of ms-LST-OA and other methods corrected by the Bonferroni method.Where blue (red) color indicates that the ITR of ms-LST-OA is lower (higher) than that of the compared method.And the color depth indicates the level of significant difference between ms-LST-OA and the compared method, with the darker color indicating the more significant difference.

TABLE III THE
HIGHEST AVERAGED ITR OF THE DIFFERENT APPROACHES