Small Data Least-Squares Transformation (sd-LST) for Fast Calibration of SSVEP-Based BCIs

Steady-state visual evoked potential (SSVEP) is one of the most popular brain-computer interface (BCI) paradigms, with high information transmission rate and signal-to-noise ratio. Many calibration-free and calibration-based approaches have been proposed to improve the performance of SSVEP-based BCIs. This paper considers a quick calibration scenario, where there are plenty of data from multiple source subjects, but only a small number of calibration trials from a subset of stimulus frequencies for the new subject. We propose small data least-squares transformation (sd-LST) to solve this problem. Experiments on three publicly available SSVEP datasets demonstrated that sd-LST outperformed several classical or state-of-the-art approaches, with only about 10 calibration trials for 40-target SSVEP-based BCI spellers.

An SSVEP is the neural response to a visual stimulus flickering at a specific frequency and phase [9]. In a typical SSVEP-based BCI system, the targets are flickering at different frequencies and phases on the screen; the user stares at a specific target, which modulates his/her EEG signals. Thus, the BCI system can decode which target the user is paying attention to, by matching the frequency and phase of the user's EEG signal with those of the targets. Many different algorithms have been proposed in recent years to improve the performance of SSVEP-based BCIs. They can be grouped into calibration-free ones and calibration-based ones, according to whether subject-specific calibration data are required [10]. The former includes minimum energy combination [11], standard canonical correlation analysis (sCCA) [2], multivariate synchronization index [12], Ramanujan periodicity transforms [13], etc. The latter includes extended CCA (eCCA) [14], multi-way CCA [15], task-related component analysis (TRCA) [16], task-discriminant component analysis [17], etc. Calibration-based algorithms usually outperform calibration-free ones, but they require to collect multiple SSVEP trials for every target, which is time-consuming and user-unfriendly [18].
Transfer learning [19], [20], [21] is frequently used to reduce the calibration effort. Its main idea is to utilize data or knowledge of existing subjects (source subjects) to facilitate the calibration of a new subject (target subject). Yuan et al. [22] proposed transfer template-based CCA (tt-CCA), which does not need any calibration data from the target subject. tt-CCA uses the average of all source subjects' templates as the target subject's template, without considering cross-subject variations. Chiang et al. [23] proposed least-squares transformation (LST) to reduce the cross-subject variations. Suefusa and Tanaka [24] transferred a subject's SSVEP template from one stimulus frequency to others. Wong et al. [25] proposed to learn across multiple stimuli, i.e., use data corresponding to the target stimulus and its adjacent stimuli to construct the target spatial filter. Wong et al. [26] also proposed subject transfer based CCA (stCCA) to further reduce the calibration effort. stCCA builds transfer templates as a weighted sum of the training templates from the source subjects, and learns common CCA-based spatial filters using a multi-stimulus scheme [25]. stCCA achieved promising performance using a small amount of calibration data from a subset of the stimuli.
This study considers the same quick calibration scenario as stCCA, i.e., only a very small number of calibration SSVEP trials from a subset of the stimuli of the target subject are available. Unlike stCCA, we use LST to reduce the crosssubject variations and then the transformed source data to construct templates and spatial filters for the target subject. Instead This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ of constructing a separate transformation matrix for each source stimulus frequency, as in [23], we construct a common transformation matrix for all frequencies. Experiments on three SSVEP datasets demonstrated that our proposed small data LST (sd-LST) approach achieved competitive performances with several classical or state-of-the-art approaches, with fewer calibration trials.
The remainder of this paper is organized as follows. Section II introduces some preliminaries of SSVEP-based BCIs. Section III proposes sd-LST. Section IV validates the performance of sd-LST on three SSVEP datasets. Finally, Section V draws conclusions.

II. PRELIMINARIES
This section introduces the notions used in this paper, and some preliminaries, including CCA and its variants, and TRCA. Table I summarizes the main notations used in this paper. Generally, a variable with a right subscript k denotes the k-th stimulus frequency, any variable with a left superscript t is from the target subject, and a variable with a left superscript s and a left subscript j is from the j -th source subject.

B. CCA-Based Approaches
CCA extracts the underlying correlation between two multichannel time series [27]. Lin et al. [2] were the first to use CCA to enhance the signal-to-noise ratio of SSVEP signals. CCA extracts frequency information of an SSVEP by calculating the canonical correlation coefficient between the EEG signal and the standard reference signal, which consists of sinusoidal signals of the stimulus frequency and its harmonics. Let X ∈ R N ch ×N s be an EEG trial, where N ch is the number of EEG channels and N s is the number of timedomain samples, and Y k ∈ R 2N h ×N s be a standard sine-cosine reference signal for the k-th stimulus frequency f k , i.e., as in (1), shown at the bottom of the next page, where N h is the number of harmonics (N h = 5 in this paper) and F s is the sampling frequency.
The goal of CCA is to find two weight vectors w X and w Y k to maximize the Pearson correlation coefficient ρ between the two weighted single-channel signals X w X and Y k w Y k , by solving the following maximization problem: where denotes matrix or vector transpose, and E is the expectation. Then, the canonical correlation coefficient between X and Y k is computed as (3) sCCA uses the following strategy to determine the stimulus frequency f c of X: where N f is the number of stimuli.
Nakanishi et al. [28] proposed extended CCA (eCCA), which uses simultaneously subject-specific templates obtained by averaging all N cali calibration trials of the same frequency f k : and the standard sine-cosine reference signals, to calculate four sets of correlation coefficients {τ k,i } 4 i=1 . The final correlation coefficient τ k is determined as: (4) can then be used to determine the stimulus frequency f c . Wong et al. [25] recently proposed multi-stimulus CCA (msCCA). Its key idea is to use the data corresponding to not only the target stimulus but also its adjacent stimuli to construct the target CCA spatial filter.
Chen et al. [29] proposed FBCCA, which uses filter banks to enhance the performance of CCA. The original EEG signal X is decomposed into M (M = 5 in this paper) sub-frequency bands by M different bandpass filters. For the m-th subband ([8m, 90] Hz in this paper), a correlation coefficient τ k,m can be computed. Then, a weighted sum of those correlation coefficients is calculated as the final correlation coefficient between X and the stimulus frequency f k : C. TRCA Tanaka et al. [30] first proposed TRCA to process nearinfrared spectroscopy data. Nakanishi et al. [16] then extended it to SSVEP-based BCIs. The main idea is to extract the task-related components by maximizing the reproducibility during the task period. When applied to SSVEP-based BCIs, TRCA maximizes the reproducibility among multiple trials to improve the signal-to-noise ratio and suppress background electrical activities.
Let the calibration trials of the k-th stimulus frequency . TRCA maximizes the inter-trial covariance after spatial filtering to extract the frequency-related components: The TRCA spatial filter w k corresponding to frequency f k can be derived by solving the following optimization problem: where Ensemble TRCA (eTRCA) [16] assembles all { w k } N f k=1 above into a common spatial filtering matrix for all stimuli, instead of using a separate spatial filter for each stimulus: eTRCA then calculates the two-dimensional correlation between an EEG trial X and the subject-specific template X k in (5), after spatial filtering: Finally, the stimulus frequency of X is again determined by (4).

III. SMALL DATA LST (SD-LST)
This section introduces first the quick calibration scenario, and then our proposed sd-LST.

A. Quick Calibration Scenario
We consider the following quick calibration scenario: 1) The SSVEP-based BCI has N f stimulus frequencies.
2) Source subjects: There are N sub source subjects, each with N block labeled SSVEP trials for each stimulus. Denote the trials for the j -th ( j ∈ [1, N sub ]) source subject as where each s j X k,i is an N ch × N s matrix. 3) Target subject: Only K of the N f (K N f ) stimuli can be selected for calibration, and each stimulus has N cali (a small number) SSVEP calibration trials. Denote the full calibration data as where a k ∈ [1, N f ] (k = 1, . . . , K ) is a stimulus index, and the indices satisfy f a 1 < f a 2 < . . . < f a K . We call this a quick calibration scenario, because instead of using all N f stimuli, as in conventional calibrations, here we only use K stimuli in the calibration. Since usually K N f , the calibration is faster.
Intuitively, the K frequencies should distribute uniformly within the N f frequencies, i.e. [26],

B. Source Data Transformation
Large variations exist in different subjects' data, leading to poor performance when using the source data directly in the target domain. LST [23], which constructs a transformation matrix between the target subject and each source subject for each stimulus frequency, has been used to reduce the cross-subject variations. However, it cannot be used directly in our quick calibration scenario, because not all stimulus frequencies of the target subject have calibration data. Our source data transformation procedure is illustrated in Fig. 1, which consists of two steps of LSTs.
Instead of constructing a separate transformation matrix for each stimulus frequency, sd-LST constructs a common transformation matrix s j P for all stimulus frequencies of the j -th source subject, by solving the following problem: in which s j X a k and t X a k are the subject-specific templates for frequency f a k of the j -th source subject and the target subject, respectively, computed similar to (5). Note that only the K templates from the source subject, and their counterparts from the target subject, are used to construct the transformation matrix.
The closed-form solution of (16) can be derived as: Then, s j P is applied to all trials of the j -th source subject, i.e., s j X k,i = s j P · s j X k,i .
After all trials from all source subjects have been transformed, sd-LST combines all source subjects' transformed data to construct a large dataset: sd-LST applies LST to s D again to make the source data closer to the target data, and the new transformation matrix s P is computed by: where in which is the subject-specific template for frequency f a k of all source subjects.
The final source dataset is obtained by applying s P to each source trial s X k,i , i.e., s X k,i = s P s X k,i .

C. SSVEP Recognition
The combination of TRCA-based and CCA-based filters has proven effective in [26]. Thus, sd-LST uses the dataset s D after source data transformation to construct TRCA-based and CCA-based spatial filters and stimulus-specific templates: 1) TRCA-based spatial filters: According to (9) and (11), an eTRCA spatial filtering matrix W can be computed from s D. Then, for a new target trial X, a TRCA-based correlation coefficient τ k,0 corresponding to the k-th stimulus frequency can be calculated as: where 2) CCA-based spatial filters: Similar to eCCA, two sets of CCA spatial filters are constructed, i.e., w X and w Y k X between X and Y k , as well as w X k and w Y k X k between X k and Y k . Then, three CCA-based correlation coefficients can be obtained: 3) Target recognition: For the k-th stimulus, a weighted sum of the above four correlation coefficients is calculated for target recognition. We combine the TRCA-based coefficient and the average of the three CCA-based coefficients to balance the two strategies, i.e., Finally, the stimulus with the maximum τ k is identified as the target stimulus, according to (4).

IV. EXPERIMENTS
This section validates the performance of sd-LST on three SSVEP datasets.

A. SSVEP Datasets
Three publicly available SSVEP datasets were used in this study: 1) Dataset I: This dataset was first introduced by Wang et al. [31] in 2016, which consists of 64-channel EEG signals collected from 35 healthy subjects. The SSVEP experiments included six blocks. In each block, the subjects were asked to stare at 40 stimuli (flickering at different frequencies ranging from 8 to 15.8 Hz with an interval of 0.2 Hz) in random order. 2) Dataset II: This dataset was introduced by Liu et al. [32] in 2020, which includes 64-channel EEG signals from 70 subjects. The stimuli and frequency range were the same as those in Dataset I, but the keyboard layout was different. Each subject had four blocks of EEG signals, each with 40 trials corresponding to the 40 stimuli. 3) Dataset III: This dataset was first used by Nakanishi et al. [33] in 2015. Ten subjects participated in the experiment. Fifteen blocks of EEG signals were collected from each subject. In each block, there were 12 trials corresponding to 12 stimuli (flickering frequencies ranging from 9.25 to 14.75 Hz with an interval of 0.5 Hz). Table II summarizes the main characteristics of the three datasets.

B. Data Preprocessing
To exclude the latency in the visual system [34], we extracted EEG signals between [t L , t L + T w ]s after each stimulus onset, where t L is the SSVEP latency (t L = 0.14s in this paper) and T w is the time-window length.
Nine electrodes (Pz, PO5, PO3, POz, PO4, PO6, O1, Oz, and O2) around the occipital area were chosen for Datasets I and II, and all electrodes were used for Dataset III.
After data segmentation, filter-bank was applied to all trials to improve the SSVEP recognition accuracy.

C. Performance Evaluation
Our main performance measures were the classification accuracy (ACC) and ITR. The ITR is computed by where P is the ACC, and T = T w + 0.5 is the average time used for each target recognition (0.5s is the gaze shifting time).
We used leave-one-subject-out cross-validation to evaluate the performances, where one subject was treated as the target subject, and all others as the source subjects.

D. Effect of N cali
N cali , the number of calibration trials for each stimulus frequency, greatly impacts the performance of sd-LST. Intuitively, a larger N cali increases the ACC, but also the calibration time.
For fast calibration, we should keep N all (N all = K × N cali ) small.
We evaluated the performance of sd-LST for different values of N cali . For each target subject, we selected N cali ∈ [1, N block − 1] trials from the N block trials of each stimulus frequency in the dataset. We repeated the experiments N block N cali times to enumerate all possible combinations of the blocks. For a given N cali , K was increased from 1 to N f with step 1, and consequently N all was increased from N cali to N f · N cali .
The results, averaged across T w ∈ {0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0}s, are shown in Fig. 2. For a fixed N all , a smaller N cali gave better performance. When K = 40, different N cali achieved similar performances. Thus, it is suggested to use N cali = 1 for best performance and fast calibration.

E. Effects of T w , N all and N sub
T w and N all determine the amount of target calibration data that sd-LST can use to construct the spatial filters and the templates, respectively. N sub is the number of available source subjects. It is interesting to study how they impact the calibration performance.
Consider T w and N all first. We kept N cali = 1 in leaveone-block-out cross-validation, where one block was treated as the target calibration data, and the rest N block − 1 blocks as the test data. Then, K trials were selected from the target block according to (15). The parameter settings are shown in Table III.
The first two columns of Fig. 3 show the performances of sd-LST with different T w (averaged across different N all ) and different N all (averaged across different T w ). The error bars indicate the standard deviations. The ACC and ITR increased as N all increased. However, the impacts of N all on both ACC and ITR were limited when N all was large enough (N all > 10 on Datasets I and II, and N all > 6 on Dataset III), indicating that sd-LST only needs very few calibration data. T w = 0.7s achieved the highest or second highest ITR on all three datasets, so it is suggested to use.
The parameter settings for studying the effect of N sub are shown in Table IV. 30 repeats were used for each individual N sub , and the N sub source subjects in each repeat were randomly selected. The results are shown in the third column of Fig. 3. The ACC and ITR increased steadily as N sub increased; thus, we can safely use all source subjects.

F. Effect of the Second LST
After the first LST, we combine the data from all source subjects as a new source domain and perform another LST to make the source data closer to the target data. Fig. 4 shows the necessity of the second LST. Although the ACC improvement was not very large, paired t-tests showed statistically significant differences for almost every K on all three   datasets, suggesting that the second LST is indeed necessary and beneficial. (34) After source data transformation, both TRCA-based filters and CCA-based filters are used in final SSVEP recognition [see (34)]. To show its necessity, we compared (34) with the following variants: 1) TRCA+CCA, which used both TRCA-based filters and CCA-based filters, but no data transformation. 2) LST-TRCA, which used data transformation, but only TRCA-based filters in final recognition. 3) LST-CCA, which used data transformation, but only CCA-based filters in final recognition. The ITRs are shown in Fig. 5. The results were averaged across different subjects and T w ∈ {0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0}. The shaded areas indicate the standard deviations. sd-LST performed much better than TRCA+CCA with a very small amount of target data, demonstrating the effectiveness of source data transformation. The ITRs of sd-LST were always the best, or very close to the best. Thus, it is generally beneficial to use both TRCA-based filters and CCA-based filters in final recognition, as in (34).

H. Performance Comparison With Other Approaches
Finally, we compared sd-LST with the following five classical or state-of-the-art approaches: 1) sCCA [2], a popular calibration-free algorithm.  2) tt-CCA [22], a transfer template based algorithm, which requires no calibration data from the target subject but some auxiliary data from source subjects. 3) eTRCA [16], a widely-used calibration-based algorithm, which requires N cali ≥ 2 and K = N f . For each subject, we repeated the experiment N block 2 times to enumerate all possible combinations of the blocks. 4) msCCA [25], a state-of-the-art calibration-based algorithm, which requires N cali ≥ 1 and K = N f . 5) stCCA [26], a state-of-the-art transfer-based algorithm, requiring only a few calibration trials from the target subject. We set N cali = 1 for all three datasets. The last two algorithms used leave-one-block-out cross-validation, and K trials were selected from the target block according to (15).
The ITRs for different K are shown in Fig. 6. The results are averaged across different subjects and T w ∈ {0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0}. The ITR of sd-LST was always higher and increased faster than stCCA, particularly on Dataset II. sd-LST outperformed sCCA and tt-CCA even when only one target calibration trial was used. sd-LST only needed a few calibration trials to outperform eTRCA and msCCA. In summary, when there are sufficient source data, sd-LST only needs to collect several calibration trials from the new subject to achieve state-of-the-art performance.
We used paired t-tests to check if the performance differences between sd-LST and others were statistically significant. The p-values are shown in Fig. 7, where each row in a subfigure shows the results between sd-LST and a specific baseline for different K . Black and gray colors indicate that the ITRs of the two approaches had no statistically significant difference ( p = 0.05). Blue (red) colors indicate that the ITR of sd-LST was statistically significantly lower (higher) than the baseline. For most of the cases, sd-LST had statistically significant performance improvements over other approaches. Table V shows the highest average ITRs across subjects for different approaches, and their optimal hyper-parameters. sd-LST outperformed stCCA with the same hyper-parameters, and eTRCA and msCCA with much fewer calibration trials. Fig. 8 shows the detailed results on the individual subjects. sd-LST performed the best on most subjects.
In summary, our proposed sd-LST is effective and stable for fast SSVEP calibration.

V. CONCLUSION
SSVEP is one of the most popular BCI paradigms, with high information transmission rate and signal-to-noise ratio. Many calibration-free and calibration-based approaches have been proposed to improve the performance of SSVEP-based BCIs. This paper considered a quick calibration scenario, where there are plenty of data from multiple source subjects, but only a small number of calibration trials from a subset of stimulus frequencies for the new subject. We propose sd-LST to solve this problem.
Experiments on three SSVEP datasets demonstrated that sd-LST outperformed calibration-free approaches like sCCA and tt-CCA, with only one calibration EEG trial. It also outperformed widely-used calibration-based approaches like eTRCA and msCCA, with only a few calibration trials. sd-LST always achieved higher ITRs than stCCA in the same quick calibration scenario.
In conclusion, our proposed sd-LST is a very promising approach for quick calibration: it only needs about 10 calibration trials for 40-target SSVEP-based BCI spellers to achieve higher ITRs than many other approaches.
Our future research will investigate more challenging calibration scenarios, e.g., the target subject has a greater number of stimulus frequencies than the source subjects, or the source and target subjects use different stimulus frequencies.