Cross-Subject Transfer Learning for Boosting Recognition Performance in SSVEP-Based BCIs

Steady-state visual evoked potential (SSVEP)-based brain-computer interfaces (BCIs) have been substantially studied in recent years due to their fast communication rate and high signal-to-noise ratio. The transfer learning is typically utilized to improve the performance of SSVEP-based BCIs with auxiliary data from the source domain. This study proposed an inter-subject transfer learning method for enhancing SSVEP recognition performance through transferred templates and transferred spatial filters. In our method, the spatial filter was trained via multiple covariance maximization to extract SSVEP-related information. The relationships between the training trial, the individual template, and the artificially constructed reference are involved in the training process. The spatial filters are applied to the above templates to form two new transferred templates, and the transferred spatial filters are obtained accordingly via the least-square regression. The contribution scores of different source subjects can be calculated based on the distance between the source subject and the target subject. Finally, a four-dimensional feature vector is constructed for SSVEP detection. To demonstrate the effectiveness of the proposed method, a publicly available dataset and a self-collected dataset were employed for performance evaluation. The extensive experimental results validated the feasibility of the proposed method for improving SSVEP detection.

By analyzing the information from the measured SSVEP signals, the visual stimulus that the user is gazing at can be detected, and the corresponding control command can be output accordingly [11]. In recent years, many target recognition methods have been proposed for SSVEP-based BCI systems. Canonical correlation analysis (CCA) is the most popular method to classify stimuli due to its ease of use and robustness [12], [13]. However, as a training-free method, its performance is easily influenced by interference from spontaneous brain activities. To alleviate this issue, many improved approaches have been proposed for SSVEP detection. In the direction of template optimization, to name a few, the L1-regularized multiway CCA (L1-MwayCCA) [14], Multiset CCA (MsetCCA) [15], individual template-based CCA (ITCCA) [16] and multi-layer correlation maximization (MCM) [17]. Alternatively, several spatial filtering methods have also been reported to lower the misclassification rate in SSVEP detection, such as a combination method of CCA and ITCCA [18], the sum of squared correlations (SSCOR) [19], and task-related component analysis (TRCA) [20].
Although the performance of SSVEP-based BCI systems was significantly boosted by these templates-or spatial filterbased methods, EEG usually suffers from inter-subject nonsmoothness and variability problems [21]. Therefore, trained templates or spatial filters can only be used for a single subject and it is difficult to transmit knowledge directly across subjects. It would prevent the broad and practical usage of BCIs in our real lives. Recently, the transfer learning (TL) technique was explored in BCIs to transfer knowledge from old sessions or subjects (the source domain) to new sessions This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ or subjects (the target domain) so that the performance of the target domain can be boosted [22], [23]. As one of the research directions, training data is usually transferred across different domains to augment the size of calibration data for new users [24], [25]. Template-based transfer learning is also a popular study area, and several approaches are listed, such as transfer template-based canonical correlation analysis (tt-CCA) [26], adaptive combined-CCA (Adaptive-C3A) [27] and inter-and intra-subject template-based multivariate synchronization index (IIST-MSI) [28]. In these methods, the transferred template is simply generated by averaging multiple trials from source subjects, which may not contain sufficient SSVEP features. Alternatively, there are multiple BCI transfer learning studies cooperating on spatial filters to learn the common feature representations across different domains [13], [29]. Liu et.al [30] proposed an all-to-one method to use data from all source subjects to train TRCA-based spatial filters. Wang et.al [31] presented an inter-subject maximal correlation method to improve the robustness of SSVEP classification. Wong et.al [32] proposed a subject transfer based CCA method which utilizes the knowledge withinsubject and between subjects. However, these methods rarely consider the correlation among the training data, the individual template, and the predefined sine-cosine signal simultaneously to enhance the effectiveness of the spatial filter [33].
In this study, we aimed to explore and exploit a transfer learning architecture to improve the recognition performance in the SSVEP-based BCI system. The main contributions of this paper are as follows: 1) a cross-subject scheme is proposed which incorporates SSVEP knowledge from source subject to effectively strengthen the recognition performance for the target subject. 2) a powerful and informative feature vector is constructed under this scheme. The multidimensional feature vector is driven partly by the transferred spatial filter and the transferred SSVEP template from source subject, and partly by spatial filter of target subject obtained by multiple covariance maximization. 3) a contribution score is introduced to each source subject by further exploring the distance between the source subject and the target subject. Validation of the performance of the proposed method was performed on a publicly available 40-class dataset [34] and a self-collected 12-class dataset. Extensive evaluations were conducted to demonstrate its effectiveness in comparison to some well-known methods. The efficiency and reliability were demonstrated with an average classification accuracy of 89.98% and 94.61% on the two datasets, respectively. This paper is organized as follows: Section II introduces the SSVEP dataset and the proposed method. Section III presents the experimental results. Discussion and conclusion are shown in Section IV and V, respectively.

A. SSVEP Datasets
In this study, the proposed method and comparing methods were evaluated on a publicly available benchmark dataset [34] and a self-collected SSVEP dataset. The benchmark dataset was recorded from thirty-five healthy participants. The user interface includes forty visual stimuli, which were coded utilizing a joint frequency and phase modulation (JFPM) method. The frequencies range from 8 to 15.8 Hz with a 0.2 Hz gap. There is a 0.5π difference between two nearby stimuli. For each subject, the experiment contains six blocks, and each block consists of forty trials corresponding to forty stimuli. More details about the benchmark dataset can be found at [34]. Information about the self-collected dataset is shown below. Hereafter, the two datasets are named Dataset I and Dataset II. 1) Participants: In Dataset II, eleven healthy subjects (five females and six males, mean age: twenty-five years) took part in this experiment. All participants had normal or correctedto-normal vision. The experiment has been approved by the Research Ethics Committee of the University of Leeds. Each participant read and signed an informed consent form.
2) Visual Stimulus Presentation: In Dataset II, there was a 4 × 3 stimulus matrix on a 23.6-inch LCD monitor with a resolution of 1920 × 1080 pixels and a refresh rate of 60 Hz. Twelve stimuli were coded using JFPM approach. The frequencies ranged from 9.25 Hz to 14.75 Hz with an interval of 0.5 Hz. The phase differed from 0 π to 1.5 π , and the interval was 0.5 π . For each subject, the experiment included five blocks, and each block contained twelve trials corresponding to twelve visual stimuli. Each trial began with a 0.5 s target cue (a red dot). After the cue, all targets flickered for 5 s simultaneously. The subject is required to focus on the target stimulus and avoid eye movement. The subject can have a rest between two neighboring blocks. Fig. 1 describes the SSVEP experimental paradigm.
3) SSVEP Signal Recording: In Dataset II, data was recorded by the equipment from g.tec medical engineering GmbH. The SSVEP data was sampled at 256 Hz by g.USB amplifier. SSVEP signals mainly appear over parietal and occipital regions since they are closer to the visual cortex of the human brain [35], [36], [37]. Some studies presented that SSVEP signals near these areas have larger amplitude and SNR [34], [38]. Therefore, nine electrodes (i.e., Pz, PO3, POz, PO4, PO7, O1, Oz, O2, and PO8) located in parietal and occipital areas were used to record EEG signals. The ground electrode and reference electrode were placed over FPz and the right earlobe, respectively.

B. Data Preprocessing
Due to the effect of visual latency in the human visual system, the data was extracted in [0.14 (0.14 + d)]s, where d The diagram of the cross-subject transfer learning method for enhancing SSVEP detection. For i-th stimulus, the spatial filter for n-th source subjectŵ n i and for the target subjectẅ i are firstly calculated based on the correlation maximization between any two of the three kinds of signals (training trials, the individual template, and the reference signal) as well as themselves via (1) - (13). The transferred template I n i , R n i and transferred spatial filterŜ n i ,T n i are then be obtained via (14) - (19). The contribution score p n,1 i , p n,2 i are assigned to correlation coefficients of n-th source subject via (22) - (25). Finally, four-dimensional feature vector ρ i can be formed by (26) and recognition results are determined via (27) - (28).
refers to the data length selected for performance analysis. The data were filtered by the Chebyshev Type I Infinite Impulse Response (IIR) filter to pass signals between eight Hz and forty Hz for Dataset II.

C. The Proposed Method
Assume that the four-dimensional EEG signal is denoted as χ ∈ R N t ×N f ×N c ×N s , where N t represents the number of training trials, N f indicates the number of visual stimuli, N c is the number of channels, and N s is the number of samples. Hereafter, i and j refer to the index of stimulus and training trial, respectively. Therefore, the two-way tensor χ i, j ∈ R N c ×N s represents the individual EEG signal for the i-th stimulus and the j-th training trial. The continuous training data is denoted asχ i = [χ i,1 , χ i,2 , . . . , χ i,N t ] ∈ R N c ×(N t ·N s ) which is constructed by concatenating N t training trials. The single-trial individual template is obtained by averaging multiple training trails, i.e., SSVEP signals can also be characterized by a series of artificial sine-cosine waves, so the reference signal Y i ∈ R 2N h ×N s is defined as: where N h is the number of harmonics, F s represents the sampling rate, and f is the visual stimulation frequency. The spatial filter can be computed by maximizing the inter-trial covariance, the covariance between training trials and individual template, the covariance between training trials and artificial reference, as well as the covariance between individual template and artificial reference. Therefore, the covariance matrix C could be represented as: where C 11 is denoted as the inter-trial covariance: C 12 and C 21 refer to the covariance between the SSVEP training trials and the individual template, which can be represented as: The similarity between the training trials and artificially constructed reference is also incorporated, which can be denoted as: C 23 and C 32 are the covariance between the individual template and reference signal: In addition, C 22 and C 33 are denoted as: Therefore, the objective function is represented as The constraint is incorporated in above optimization problem, i.e., w T i Qw i = 1, where covariance matrix Q is denoted as follows: where Therefore, the constrained optimization problem can be formulated as:ŵ The T is obtained as the eigenvector of the matrix Q −1 C corresponding to the largest eigenvalue. The N f spatial filters are concatenated to make spatial filters Hereafter, the variable with a right superscript n, (n = 1, 2, . . . , N sub ) refers to the fact that it is provided by the n-th source subject. N sub is the number of transferred source subjects. Therefore, the two kinds of transferred templates, i.e., transferred individual template I n i ∈ R N f ×N s and transferred reference template R n i ∈ R N f ×N s , provided by n-th source subject can be represented as: Let the variable with the double-dot superscript denote that it is provided by the target subject. The transferred spatial filtersŝ i j andt i j for the i-th stimulus and the j-th training trial corresponding to the two kinds of transferred templates can be calculated by solving the following formula: s n i j andt n i j can be estimated via least-squares regression [29]: The final transferred spatial filtersŜ n i andT n i provided by n-th source subject can be obtained by averaging all training trials. Suppose that X refers to the test data from the target subject, the two correlation coefficients can be calculated as: According to the distance between the source subject and the target subject, weights will be assigned to correlation coefficients corresponding to different source subjects. For ith stimulus and n-th source subject, the distance is measured by the correlation coefficient between the spatially filtered training trials of the target subject and the corresponding transferred template: Therefore, the weights also called contribution scores are represented as: Therefore, for i-th stimulus frequency, the correlation vector ρ i is denoted as follows: The above correlation coefficients are employed to construct the final feature for target recognition: Therefore, the frequency of test trial can be determined by the following formula: The framework of the proposed cross-subject transfer learning method was shown in Fig. 2.

A. Performance Evaluation
Average classification accuracy and information transfer rates (ITR) are two widely used indicators to evaluate the performance of SSVEP-based BCIs. ITR (bits/min) can be calculated as follows: where P is the accuracy of target identification, and T is the average time for a selection, including gaze shifting time (0.5 s) and gaze time. Fig. 3 shows the average accuracy and ITR for the proposed method and comparing methods. The sampling rates are different in the two datasets, thus different data lengths were used to keep the number of samples without decimals. The data lengths ranged from 0.2 s to 1 s with an interval of 0.2 s for Dataset I and 0.25 s to 1 s with an interval of 0.25 s for Dataset II. The accuracy and ITR were obtained via a leave-one-out cross-validation, where five or four blocks were used for training and a left-one block was used for testing on Dataset I and II. For the proposed method, source subjects are selected randomly for transfer. In order to get a general performance of the proposed method, each process was conducted ten times for Dataset I and five times for Dataset II. The different numbers of repeat times depend on the size of the two datasets being different. The averaged results were shown for performance evaluation. The number of source subjects is five for both datasets. The reason was clarified in Section III-B. It is obvious that the proposed method can achieve higher accuracy and ITR than TRCA/SSCOR with different time windows (TWs) on two datasets. Oneway repeated-measures ANOVA was conducted to explore the similarity of classification performance among the methods on two datasets. The statistical analysis results show that there are significant differences among these methods on accuracy and ITR with each data length. Fig. 4 shows the probability density of classification accuracy for three methods on (a) Dataset I and (b) Dataset II via violin plots. The plots analyzed SSVEP signals with different data lengths. The violin plot focuses on illustrating the distribution of quantitative data in a visually intuitive way. The thick black line in the middle represents the median value, and the black lines on either side represent the interquartile range (25% and 75% percentiles). The wider regions of the violin plot denote values that appear more frequently. As shown in Fig. 4(a) and Fig. 4(b), the violin plots provided by the proposed method (i.e., the pink) generally present higher median values and more concentrated distributions. Therefore, the experimental results indicate that, the proposed method can achieve a more stable and superb classification performance on various subjects compared with TRCA and SSCOR. Fig. 5 as an example, shows the accuracy comparison between the proposed method, TRCA, and SSCOR for different target subjects. The source subjects were randomly selected, and the indexes are [7 12 18 19 33] in this case. The remaining thirty subjects were used as target subjects for performance comparisons. The experiment result shows that the proposed method achieves higher SSVEP classification accuracy for almost all target subjects. Fig. 6 illustrates the feature values of forty stimuli provided by the proposed method and TRCA for an example target subject (S17) with 0.6 s data length. Compared with SSCOR, the performance gap between TRCA and the proposed method is closer, so we further compared these two methods. The feature values of the proposed method were calculated via (27). Each sub-figure represents a test trial, and the sub-title represents the accurate recognition result. The first four trials were selected and amplified for better viewing details. For each test trial, forty feature values were calculated, and the stimulus corresponding to the largest value was determined as the target via (28). The blue and orange circles represent the decisions of the proposed method and TRCA. The hollow circles turned to solid circles as the decisions were accurate. Obviously, the proposed method provided more accurate recognition results. Besides, for those trials where both methods provide correct results, the proposed method shows more distinctive and apparent feature values, such as 2, 6, 8, 14, and 16-th stimuli. It indicates the effectiveness of the proposed feature vector construction strategy in (26).

B. The Effect of Parameters
1) The Number of Training Blocks: An important purpose of the proposed method is to reduce the need for individual training data. The proposed method should classify SSVEP responses with sufficient accuracy even with a reduced number of individual training data blocks. Fig. 7 uses heat maps to show the SSVEP classification accuracy comparison between TRCA, SSCOR, and the proposed method with various numbers of training blocks on (a) Dataset I and (b) Dataset II. The heat map acts as a graphical representation of data, displaying values by color in two dimensions. It provides a more visual path to describe numeric values. In the heat map, the x-axis refers to the classification method with a   5. Comparison between the accuracy of the proposed method, TRCA and SSCOR for different target subjects with 0.6-long data length. The source subjects were selected randomly. In this case, the source subjects are [7 12 18 19 33], and the rest are target subjects. Fig. 6. Feature values of the forty stimuli obtained by the proposed method and TRCA using a 0.6 s time window for an example subject (S17). The source subjects were selected randomly. The blue and orange circles represent the recognition results of the proposed method and TRCA. The hollow circles turned to solid ones as the results were accurate.
corresponding number of training blocks, and the y-axis indicates the subject index. The accuracy of the target subjects is provided here. The number range of training blocks is [3,5] for Dataset I and [2,4] for Dataset II. The heat maps visualize the highest classification accuracy and lowest accuracy using colors on a scale from light to dark. As shown in Fig. 7(a) and Fig. 7(b), the proposed method generally provides the squares with the lightest color regardless of the number of training   blocks. Besides, with the increasing training data scale, the squares generally turn lighter. Table. I shows the numerical classification accuracy of three methods and corresponding one-way repeated-measures ANOVA analysis results. The results revealed that there was a statistically significant difference (i.e., P < 0.0001) between the compared methods with all numbers of training blocks for Dataset I and Dataset II. In conclusion, this table further demonstrates the effectiveness of the proposed method by providing more quantitative evidence.
2) The Number of Channels: We further investigated how the number of electrodes affects the performance of the proposed method and the compared methods. Fig. 8 shows the classification accuracy results for (a) Dataset I and (b) Dataset II. As the number of channels increases, the recognition accuracy generally increases for all methods. As indicated in Fig. 8(a) and Fig. 8(b), the proposed method always provides the highest classification accuracy with a different number of channels ranging from five to nine for each dataset. Besides, the statistical analysis results show that there is a significant difference between the three methods.
3) The Number of Source Subject: Fig. 9 shows how the number of source subjects affects the performance of methods on (a) Dataset I and (b) Dataset II. The classification accuracy in Fig. 9(a) and Fig. 9(b) is calculated by the target subjects, which does not include source subjects. Therefore, to make the comparison more reasonable, TRCA and SSCOR also show various accuracy values for a different number of source subjects. As the number of source targets increases, the recognition performance of the proposed method generally improves slightly and then decreases. The highest value typically occurs at five, so the number of source targets is set at that in the analysis. The figure also shows that the number of source targets does not have a significant effect on the performance of the proposed method, making this parameter choice representative and reasonable. The [31] also has the same setting for the same publicly available dataset.

C. Filter-Bank Analysis
Filter-bank analysis was used to further compare the recognition performance of the proposed method and other methods in this study. The filter-bank technology decomposes the SSVEP signals into N b sub-band to investigate the information embedded in the harmonic components [39]. The cut-off frequency range was set between b × 8 Hz and 90 Hz for the b-th sub-band, where b = 1, 2, . . . , N b refers to the sub-band component number. The feature β b i was extracted from b-th sub-band signals and then a weighted summation was obtained from all sub-bands as: i [32]. The target frequency can be recognized by the formula: The proposed method provided the highest accuracy and ITRs for all data lengths. One-way repeated-measures ANOVA was conducted to further compare these methods. The statistical analysis results indicate that there are significant differences among the three methods in terms of accuracy and ITRs in each dataset.

D. Performance Comparison With Data Augmentation Methods
In this study, the proposed method incorporates SSVEP data from the source subject to effectively improve the recognition performance for the target subject. In other words, the data in the target domain was augmented via auxiliary data from the source domain. In this subsection, the proposed method was further compared with two data augmentation methods, including multi-stimulus eCCA (MSCCA) [40] and taskdiscriminant component analysis (TDCA) [41]. The number of channels and training blocks are set to nine and five for all methods. For TDCA, the number of subspaces and the number of delayed points are eight and one, respectively. In accord with the comparison shown in Fig. 11, the proposed method Barchart of the classification accuracy and ITR of three methods with a different number of sub-band. The error bars represent SEM. The asterisks indicate significant differences between the three methods obtained by one-way repeated-measures ANOVA ( * : p < 0.05, * * : p < 0.01, * * * : p < 0.001, * * * * : p < 0.0001). achieved the highest accuracy and ITRs among all compared methods with almost data lengths. A one-way repeatedmeasures ANOVA revealed that there was a statistically significant difference between the compared methods. The evaluation results further demonstrated the effectiveness and feasibility of the proposed method in SSVEP recognition of the BCI system.

A. Model's Performance
Almost recognition methods in SSVEP-based BCI fields built spatial filters via considering the relationship between the EEG signal and the artificial reference or the individual template, e.g., CCA and IT-CCA [12] or the relation across training trials, e.g., TRCA and SSCOR [19], [20]. In this study, the spatial filter was trained with multiple similarity constraints. Specifically, maximizing the reproducibility across trials could extract task-related components [20], but it may also bring task-related noise [42]. It is reasonable to remove noise and extract more SSVEP-related features by incorporating the covariance maximization between the training trial and the individual template, between the training trial and the artificial reference, as well as between two templates. As a cross-subject scheme, the transferred template and transferred spatial filter are used to boost the SSVEP detection performance for target subject. As shown in Fig. 3, the accuracy of the proposed method is 7.19% higher than that of TRCA and 19.05% higher than that of SSCOR on Dataset I with 0.6 s long data length. Besides, the proposed inter-subject transfer learning scheme does not require massive amounts of training data from the target subject and still achieves superior SSVEP classification performance. As shown in Fig. 7(a), the accuracy of TRCA with five training trials (i.e., 63.69%) is close to the accuracy of the proposed method with only three training trials (i.e., 65.81%) on Dataset I.

B. Feature Vector Construction
In this study, the feature vector (26) includes four types of correlation coefficients, two of which come from the source subjects and the other two from the target subject. We further explored the difference in classification accuracy and ITR between this design and feature vector information only provided by the target subject. Fig. 12 shows the comparison results on (a) Dataset I and (b) Dataset II. The proposed method shows better SSVEP recognition performance compared with the method without transfer learning. Paired t-test was used to measure the similarity of these methods. The statistical results show that there are significant differences in accuracy or ITR between two methods at each data length on two datasets. It means that transferred information from source subjects is beneficial to improving the SSVEP recognition performance of the target subject.

C. Future Work
The proposed method designed transferred templates and transferred spatial filters for enhancing the target subject's classification performance. The temporal knowledge included in the source subjects is not considered. The temporal information hidden in the SSVEP signals may also contribute to improving the recognition effectiveness of a SSVEP-based BCI system. Future work will thus explore the spatio-temporal filtering method to transfer knowledge across subjects.

V. CONCLUSION
In this study, a cross-subject transfer learning scheme was proposed for enhancing SSVEP classification performance. The spatial filter was first trained via multiple covariance maximization. The relationships between training trials, the individual template and artificial reference were properly considered in the spatial filter training process. The spatial filters were then applied to the aforementioned templates to construct two new transferred templates, on which the transferred spatial filter can be obtained accordingly. The contribution scores of different source subjects to the feature vector were calculated by their distances from the target subject. Finally, a four-dimensional feature vector was constructed for each stimulus to achieve SSVEP recognition. The effectiveness and feasibility of the proposed method were demonstrated via experimental evaluation on a publicly available dataset and a self-collected dataset.
Data Access Statement: The data presented in this study are available from the corresponding author upon request.