A Calibration-Free Hybrid Approach Combining SSVEP and EOG for Continuous Control

While SSVEP-BCI has been widely developed to control external devices, most of them rely on the discrete control strategy. The continuous SSVEP-BCI enables users to continuously deliver commands and receive real-time feedback from the devices, but it suffers from the transition state problem, a period the erroneous recognition, when users shift their gazes between targets. To resolve this issue, we proposed a novel calibration-free Bayesian approach by hybridizing SSVEP and electrooculography (EOG). First, canonical correlation analysis (CCA) was applied to detect the evoked SSVEPs, and saccade during the gaze shift was detected by EOG data using an adaptive threshold method. Then, the new target after the gaze shift was recognized based on a Bayesian optimization approach, which combined the detection of SSVEP and saccade together and calculated the optimized probability distribution of the targets. Eighteen healthy subjects participated in the offline and online experiments. The offline experiments showed that the proposed hybrid BCI had significantly higher overall continuous accuracy and shorter gaze-shifting time compared to FBCCA, CCA, MEC, and PSDA. In online experiments, the proposed hybrid BCI significantly outperformed CCA-based SSVEP-BCI in terms of continuous accuracy (77.61 ± 1.36%vs. 68.86 ± 1.08% and gaze-shifting time (0.93 ± 0.06s vs. 1.94 ± 0.08s). Additionally, participants also perceived a significant improvement over the CCA-based SSVEP-BCI when the newly proposed decoding approach was used. These results validated the efficacy of the proposed hybrid Bayesian approach for the BCI continuous control without any calibration. This study provides an effective framework for combining SSVEP and EOG, and promotes the potential applications of plug-and-play BCIs in continuous control.


I. INTRODUCTION
B RAIN-computer interfaces (BCIs) bridge the brain with external devices by directly decoding users' intentions without the neuromuscular pathway [1]. Among the noninvasive modalities, the steady-state visual evoked potential (SSVEP)-based BCI has attracted widespread attention due to its advantages in high information transfer rate (ITR) and little user training time [2]. The SSVEP-BCI has been successfully developed in brain-actuated wheelchairs [3], [4], robotic arms [5], and spellers [6], [7], showing its potential for practical applications.
Currently, most SSVEP-BCI studies rely on the discrete control strategy, in which users' intentions can only be outputted once every 2-4s due to the fixed timeline. In detail, trials with a duration of 2-4s and a short rest interval of 0.5s in between are sequentially implemented in discrete BCI, and users can only perform one mental task in each trial. The system extracts the EEG segment of the entire trial for classification, and therefore users output one command and receive one feedback at the end of each trial discretely. Users' intention output must follow the system pace [3], [4], [5], [6], [7]. On the contrary, the continuous BCI consecutively estimates users' intention and outputs commands to control the external devices, and users receive real-time feedback from the instant response of the devices [8], [9], [10]. In detail, the trial duration in continuous BCI is much longer (usually >10s), in which users modulate their neural activities consecutively to control the external device, e.g., controlling the wheelchair to reach a destination. The moving window approach is adopted, and the EEG segments of the latest 2-4s (window length T ) are extracted every 40-100ms (moving step t) over time, so users can perform multiple mental tasks within the trial and control the device using a series of commands with a short time interval, which coincides with the realistic application scenarios [11], [12]. Users can switch between different mental tasks freely without the fixed timeline. Researchers have implemented SSVEP-BCI systems for continuous control of virtual ball [13], cursor [14], simulated vehicle [15], and quadcopter [16].
In continuous SSVEP-BCI, users frequently shift their gaze among different targets to change the behavior of the device, which results in the transition state problem [15], [16]. Specifically, when users shift their gaze from one stimulus to another, e.g., make a left turn after controlling the wheelchair passing through a corridor, the extracted EEG segments in the moments of the following T s contain both the SSVEP of the old target and the SSVEP of the new one. During this period, the BCI cannot recognize the new target due to the mixture of two different SSVEP signals and output erroneous and chaotic commands. This false recognition period is called the transition state in the literature [15], [16]. Bi et al. found that the SSVEP-BCI recognition was at the chance level during the transition state [15]. Decreasing the window length T might shorten the duration of transition state between targets, but it also results in a low decoding performance when users steadily gaze at one target, so there is a trade-off between the accuracy and the speed of BCI systems [17].
Developing SSVEP detection methods under short time window can solve the transition state problem, and many training-based methods have been proposed in recent years, e.g., task-related component analysis (TRCA) [7] and task-discriminant component analysis (TDCA) [18]. To successfully implement these methods, the initial phases of stimuli in each discrete trial need to be the same, so that the extracted EEG segments are phase-locked to the stimulus onset and have the same phases with each other. However, this requirement is not maintained in continuous SSVEP-BCI because the EEG segments are extracted continuously based on the moving window approach, and thus the segments in different moments have different phases and are not phase-locked to each other. Directly implementing those methods in continuous signal process results in a poor recognition performance [19], [20]. Another solution was proposed by Wang et al. [16], in which the distance between stimuli was enlarged, and the necessary head movement when users shifted between targets could be captured and recognized by a head-mounted device (HMD). Nevertheless, the head turning was more demanding compared to the gaze shift, and it might cause a physical and mental fatigue due to the increased task burden. To sum up, the current solutions cannot effectively address the transition state problem in the continuous SSVEP-BCI.
This study proposes a hybrid BCI combining SSVEP and EOG signals to solve the transition state problem. In detail, four stimuli flickering at different frequencies corresponding to four frequently used direction commands-up, right, down, and left, are presented in the graphical user interface (GUI). Users output commands by gazing at the desired target. When users shift their gazes towards new targets, besides the SSVEP elicited by the new target, eye movement of the saccade, i.e., a rapid eye movement when eyes move from one position to another [21], [22], occurs spontaneously and naturally. The saccade can be captured by the EOG signals and might be combined with the SSVEP signals to identify the new target to mitigate the transition state problem. Therefore, the SSVEP signals and EOG signals are continuously extracted and processed to predict users' intentions. In detail, the prior probability distribution of the targets is first obtained by decoding the SSVEP signals, and a detection method based on the adaptive threshold is proposed and utilized to detect the saccade events. Then, when a valid saccade event is captured, it serves as an additional event and the Bayesian approach is utilized to calculate the posterior probability distribution, and finally the new target can be recognized rapidly and accurately. While our prior work represented a calibration-based hybrid BCI [23], a calibration-free framework for combining the SSVEP and EOG to solve the transition state problem was proposed in this study, and the proposed BCI system was testified in both offline and online experiments in terms of both BCI performance and user experience. The results showed that the proposed hybrid BCI significantly outperformed SSVEP-BCI in terms of accuracy and gaze-shifting time (the duration of transition state) without increasing task burden physically or mentally.
The paper is organized as follows. The experimental design, online signal processing, and evaluation metrics are described in Section II. The experimental results are presented in Section III, with discussion and conclusion followed in Sections IV and V, respectively.

A. Subjects and Experimental Setup
Eighteen healthy subjects (4 females; average age 22.4±1.9; range: 19-25; none had BCI experience with the SSVEP or EOG task) participated in the experiments. The study designed an offline experiment and then validated its efficacy using an online experiment. Each subject was requested to participate in one offline experiment session and another online experiment session about one month later. Fifteen subjects participated in offline experiments. Among them, twelve subjects completed online sessions, and three of those subjects dropped due to schedule conflicts. Additional three subjects were recruited for the online experiment. All procedures and protocols were approved by the Institutional Review Board of Shanghai Jiao Tong University, Protocol No. (IRB HRP E2021216I), date of approval (March 4 th , 2021). Informed consent was obtained from all subjects prior to their participation in the experiment.
EEG and EOG signals were acquired at 1024Hz using a 64-channel Biosemi Active Two system, a cap with active electrodes, and four additional EOG electrodes (Biosemi, Amsterdam, The Netherlands). Two electrodes near channel POz, named CMS and DRL, were used as reference and ground. The offsets of all electrodes were kept below ±40 mV. A notch filter at 50Hz was applied to the raw EEG and EOG signals.
Nine channels in the parietal and occipital region, including Pz, POz, PO3, PO4, PO7, PO8, Oz, O1, and O2 were used to decode the SSVEP signals, as shown in Figure 1 (a). A bandpass filter encompassing 0.1 to 90Hz was applied to EEG signals before any further processing. Four channels on the horizontal (A, B) and vertical (C, D) axis of the eyes were used to record EOG data, as shown in Figure 1 (b). A low pass filter at 20Hz was applied to EOG signals. The Chebyshev Type I filter was selected as the filtering method and the order of the filters was five. All filters mentioned above were implemented by software and were applied to each epoch of data during the experiment. A Tobii TX300 (Tobii, USA) eye tracker at a sampling rate of 300Hz was used to record eye movement during the experiments. It was used as a gold standard for evaluating the gaze position and would not be used for decoding in the experiments. A chin rest was used to fixate the subjects' head position.
The stimuli used to evoke SSVEP were presented on a 24.5inch LCD monitor with a 1920 × 1080 pixels resolution and a refresh rate of 240Hz. A sampled sinusoidal stimulation method [24] was used to present up, right, down, and left stimulus flickering at the frequency of 7.5, 9, 10.5, and 12Hz, respectively.

B. Experimental Design and Protocol
An offline experiment was designed to simulate the transition process in the continuous SSVEP-BCI. For example, users might first gaze at the up stimulus to control the wheelchair going forward to pass through a corridor, then shift their gaze toward the right stimulus to output a right turn command, but the BCI system outputs erroneous commands due to the transition state problem. Therefore, the offline experiment, which included all the possible shifting processes between targets, was conducted to (1) verify the existence of the transition state problem found in the literature, and (2) explore the feasibility of the potential solution, i.e., the hybrid approach combining SSVEP and EOG. Specifically, the offline experiment consisted of ten runs. Each run was composed of 12 trials, and the timeline of each trial was shown in Figure 2. During each trial, the subjects were instructed to gaze at the target where the visual cue (a red square) was indicated. At the beginning of each trial, all stimuli started flickering, and the visual cue was located at the center of the screen where no stimulus was presented for three seconds. Next, the visual cue changed its position to one of the four stimuli (up, right, down, and left), indicating the subjects to shift their gaze from the center of the screen to the target stimulus (target 1) and gazed at it for 4-6s. Then, the visual cue changed its position to one of the remaining three directions, and subjects needed to shift their gaze to a new target stimulus (target 2) and gazed at it for another 4-6s. Afterward, the stimuli stopped flickering, and the screen was blank for 2s before the next trial began. Note that the subjects were only allowed to shift their gaze through eye movements while their head position was fixated on the chin rest. The durations of cueing targets 1 and 2 were randomly chosen from 4, 5, or 6s to avoid users' prediction and adaptation. The 12 (A 2 4 ) trials included all the possible choices of selecting two targets from a total of four targets, and the sequence of the trials presented in each run was also randomly chosen. A rest of several minutes between two consecutive blocks was set to avoid visual fatigue.
After the offline experiment, an online continuous BCI experiment was conducted one month later to verify the effectiveness of the proposed hybrid BCI. The online experiment also consisted of ten runs. The settings, e.g., stimuli, the red visual cue, and the head fixation, were accordance with the offline experiment, but a continuous decoding and feedback were added into the online experiment. In detail, a gentle audio reminder was sounded two seconds before the visual cue changed to target 2 in each trial, indicating the begin of the continuous decoding. Meanwhile, visual feedback (a green square) was provided and kept updating, indicating latest detection results. The feedback was updated every 31.25ms, which was the same as the moving step of the moving window approach (see details in Section II-C). Signal processing methods of the BCI system for each run were chosen from SSVEP and the hybrid approach (see details in section II-C) based on block randomization. No subjects were told which approach was used for each run. Parameters for online decoding such as window length were chosen based on the offline analysis (see details in Section III-C). After each run, subjects were asked to score a few questions instantaneously to assess their performance and feelings (see details in Section II-D).
C. Signal Processing 1) Gaze Position Determined by the Eye Tracker: The eye tracker was used as a gold standard in the experiments. A threshold was set to gaze position values, rounding the values into three categories (horizontal: 0, left side; 0.5, center; 1, right side; vertical: 0, upside; 0.5, center; 1, downside) by the following equation: G was the raw value of gaze position, G ad just was the projected value after setting the threshold. The process of obtaining G and G ad just was both utilized horizontally and Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
vertically. Based on G ad just , the target that the subject gazed at could be obtained. The gaze-shifting time was obtained at the rising or falling edge of the adjusted curve. Note that we used the time at the last rising or falling edge as a gaze-shifting time only if the gaze shift was correct both horizontally and vertically. Otherwise, the gaze-shifting time was invalid. 2) SSVEP Detection by the CCA Algorithm: SSVEPs were analyzed using canonical correlation analysis (CCA) [25], a widely used calibration-free method in detecting the frequency of SSVEPs due to its high precision, low computational complexity, and ease of use. Given two multi-dimensional variables X and Y and their linear combinations x = X T W X and y = Y T W Y , weight vectors W x and W y were optimized to maximize the correlation between x and y by solving the following problem: the maximum value of ρ with respect to W x and W y was the maximum canonical correlation. For SSVEP detection, X ∈ R N ×M indicated the multi-channel EEG data epoch with the size of N channels and M samples, and Y ∈ R Q×M was referred as the reference signal: where f was the stimulation frequency, N h was the number of harmonics and Q was set as 2N h . CCA calculated the canonical correlation between multi-channel EEG data epoch and reference signals with respect to each stimulation frequency. The frequency with maximal correlation was recognized as the attended frequency of SSVEPs. SSVEPs were detected in different window lengths (from 0.5s to 3.0s in steps of 0.25s) with a 31.25ms' sliding window step in offline experiments. For online experiments, the sliding window step was the same as in offline experiments, and the window length was determined based on offline experiments analysis (see details in Section III-C).
3) Saccade Detection Algorithm: Saccades occurred when subjects shifted their gaze in a new direction. Saccades were captured by EOG signals and analyzed using an adaptive threshold of the EOG signals' first-order derivative inspired by Behrens and colleagues' work [26]. First, in each 31.25ms' sliding window step, the standard deviation of 200 preceding EOG signals' derivative values was calculated; then, the adaptive threshold N σ was obtained by a multiple (N = 3) of the standard deviation and was updated constantly. During the fixation, most of the derivative values were distributed in the interval between +N σ and −N σ , while this threshold was exceeded when a saccade occurred due to rotation of the eyeball. Note that the threshold was fixed for a short period once a saccade was detected to avoid the following saccade's influence. The event was labeled as a valid saccade if the duration of exceeding the threshold was longer than a required minimum duration, which was empirically set to 30ms. If a valid saccade was detected, the threshold remained a constant value for 200ms in detecting a possible following saccade; otherwise, the threshold was updated instantly according to the latest 200 preceding values. The detection procedure is shown in Figure 3.
4) SSVEP-EOG-Hybrid Bayesian Approach: Gaze shift between targets causes a nonstationary dynamic process reflected in EEG signals. An erroneous transition state occurs when using SSVEP detection methods alone. In this study, an SSVEP-EOG-hybrid Bayesian approach by combining SSVEP detection and saccade detection was proposed to alleviate the transition state problem. Suppose a subject's last fixating stimulus S pr e was predicted by the previous output of the decoder. Canonical correlation ρ calculated by CCA for each stimulus was interpreted as the probability P(S i ) that the subject was fixating on this stimulus currently.
where S i was the event that the subject was currently gazing at the stimulus i (i = 1, 2, 3, 4 represented for stimulus up, right, down, and left respectively), ρ i was the canonical correlation corresponding to each stimulus, and n was the number of stimuli on the screen, which equaled 4. If no saccade was detected, the stimulus with the highest probability was selected as the target, which was the same as CCA detection. However, if a saccade event E was detected using EOG, the probability distribution of the current stimulus might be better depicted by a posterior probability. Thus, the posterior probability for each stimulus was optimized using a Bayesian estimation: where P(E|S i ) was the probability distribution of the current saccade event when subjects shifted their gaze S pr e to S i . Since P(S i ) is a known value, the critical step was to estimate the P(E|S i ). In our approach, the probability distribution of P(E|S i ) was estimated from the offline experimental data. Suppose S i = S pr e , which meant the subject didn't shift his/her gaze; the probability of detecting a saccade was equal to the false positive rate (FPR) of saccade detection. Suppose S i ̸ = S pr e , which meant the subject shifted her/his gaze, the probability of detecting the correct saccade was equal to the true positive rate (TPR) of saccade detection, while the probability of detecting a wrong saccade was equal to FPR. The estimation of these values needed an initial training dataset but would not vary much in the future. Thus, the hybrid Bayesian approach was calibration-free. Finally, in the case of a saccade was detected, the stimulus k with the maximum probability P(S k |E) was selected as the target. The overall detection procedure is shown in Figure 4.
D. Performance Evaluations 1) SSVEP Detection Methods for Comparison: In this study, the existing SSVEP detection methods, including minimum energy combination (MEC) [27], power spectral density analysis (PSDA) [13], [14], [15], [28], [29], CCA [16], [25] and filter-bank CCA (FBCCA) [30], [31] were implemented to compare and evaluate the performance of the proposed hybrid approach. In the MEC algorithm, the multiple channels of EEG signals were linearly combined to minimize the contained nuisance signals and noise. Then, the test statistics indicating the signal-to-noise ratio (SNR) were calculated for each stimulus frequency, and the largest one was selected as the detected target. In the PSDA algorithm, the power spectral density was estimated by the FFT, and the intensity of the response for each stimulus frequency was calculated as the sum of the amplitudes of the fundamental frequency and the second harmonic. Finally, the largest one was selected as the detected target. FBCCA utilized multiple filters with different passbands to filter the EEG segment, then the CCA was applied to each sub-band data component, the weighted sum of correlation values in different sub-bands were obtained, and finally, the target with the maximum correlation was selected as the detected target. The parameters of FBCCA used in this study were identical to [30].
2) Metrics Evaluation: The data analysis of this study was different from the discrete SSVEP-BCI studies. Conventionally, in the discrete SSVEP-BCI experiment, each trial lasted for 2-4s, and users had one task in each trial, i.e., steadily gazing at one target. In this study, although the experiments were still composed of several trials, they are different from the discrete experiment in the following two aspects: (1) in each trial, users have two different tasks, i.e., gazing at target 1 and gazing at the target 2 sequentially, and the duration of each trial lasted for 11-15s; (2) instead of extracting the segment of the whole trial for classification and outputting one command in each trial as the discrete BCI did, the moving window approach was adopted to each trial with the window length T of 0.5-3.0s and the moving step of 31.25ms. Therefore, the results were classified consecutively over time, and three different states, gazing at target 1 and target 2, and the transition state can all be recognized in one trial. An example of a trial decoding process is shown in Figure 5. Each red circle represented a decoding result, which is decoded using data of T s before the current moment. The time interval between two consecutive red circles is 31.25ms.
Therefore, the gaze-shifting time (GST; unit: second) and the overall continuous accuracy were used for evaluating the offline and online experiments. As shown in Figure 5, the GST indicates the duration of transition state, and is measured by the duration from t 1 to t 2 in each trial. t 1 represents the time moment when the visual cue changed to target 2, and t 2 represents the time moment when the BCI first detected the new target accurately and steadily. The time moment t 2 is carefully chosen as the first time moment when the current and the following 20 decoding results are all target 2. The reason is that even when the subject is steadily gazing at target 1, the decoding results might mistakenly be target 2 due to the nonstationary EEG signals and the imperfect performance of the BCI. Therefore, the first time moment successfully decoding the target 2 after gaze shifting couldn't robustly measure the overall delay time of the system because it might be a chaotic recognition result and the following several decoding results might not also be the target 2.
Moreover, the overall continuous accuracy was calculated as the number of correct detection points divided by the total number of detection points in each trial's continuous decoding range, as shown by the grey dotted arrow in Figure 5. The continuous decoding range lasted from 2 seconds before the visual cue changed from target 1 to target 2 until 4 seconds after the change in each trial so that the recognition accuracy of the transition state, the two steady states before and after the transition can be measured all together, indicating the overall continuous BCI performance.
Besides, a pseudo discrete accuracy was calculated for the offline experiment, where two data epochs were extracted for each trial to verify the performance of the SSVEP detection method. Here two data epochs were obtained as follows: subtracting a period of T seconds from the end moment of visual cues for target 1 and target 2, respectively, as shown in Figure 5. For the saccade detection, true positive rate (TPR) and false positive rate (FPR) were used to evaluate the performance in the offline experiment. TPR was calculated as the number of correctly detected saccade events divided by the number of actual saccade events (which was evoked once for each trial when subjects shifted their gaze from target 1 to target 2). FPR was calculated as the number of falsely detected saccade events divided by the number of detection points in the continuous decode range.
For the online experiment, besides the gaze-shifting time and the overall continuous accuracy, an assessment on subjective evaluation under SSVEP and the hybrid Bayesian approach were recorded after each run through the following questions: • How fast did the feedback (green square) update to the new target after shifting your gaze to the new target?
• How accurate was the feedback when you gazed at the stimulus steadily?
• How much did you like the behavior of the current feedback?
• How much effort did it require to accomplish the experiment?
• During the experiment, were you tired? • During the experiment, were you nervous? • During the experiment, were you frustrated? The answer to each question was rated in 10 levels. Note that no subjects knew which method was chosen for each run since a block randomization design was used. Their scores were compared later in the offline analysis.
3) Statistical Analysis: When applicable, results were expressed as mean ± SEM (standard error of the mean) unless otherwise stated. Repeated measures analysis of variance (ANOVA) was applied to test the difference of pseudo discrete accuracy under different harmonics N h and the difference of continuous accuracy and GST under different window length T and different detection methods. The Greenhouse-Geisser correction was used if the data didn't conform to the sphericity assumption by Mauchly's test of sphericity. All post hoc pairwise comparisons were Bonferroni corrected. The paired t-tests were employed to evaluate the online performance between the SSVEP and the hybrid approach. The alpha level was set at 0.05.

A. Pseudo-Synchronous SSVEP Performance in the Offline Experiment
First, the pseudo discrete accuracy was calculated in the offline experiment to verify the performance of CCA. The pseudo discrete accuracy of CCA in terms of different window lengths T (from 0.25s to 3.0s in steps of 0.25s) and different numbers of harmonics in the reference signals (i.e., N h in equation (2)) was shown in Figure 6 (a). The pseudo discrete accuracy reached 90% when the window length T > 2s and the number of harmonics N h > 2. Figure 6 (b) illustrates the pseudo discrete accuracy under a window length of 2s, which was used in the online experiment, with an increasing number of harmonics. Repeated measures ANOVA showed that there was a significant difference between conditions under different numbers of N h (F(9, 126) = 16.33, p < 0.05). Pairwise comparisons revealed that there was a significant difference between N h = 1 and other N h values. N h was set to 5 empirically to balance the accuracy and complexity in the following analysis.

B. Saccade Detection Performance in Offline Experiment
Saccade detection performance played an important role in the proposed hybrid BCI. Figure 7 (a) shows the performance of saccade detection under different saccade directions using the data in the first five runs of the offline experiment. The overall average and SEM in terms of TPR and FPR were86.44 ± 2.34% and 5.00 ± 0.54%, respectively. Therefore, the values of P(E|S i ) in Eq. (4) were set as follows. P(E|S i ) = F P R = 0.05 if S i = S pr e , which meant the subject didn't shift his/her gaze, but a saccade was detected. P(E|S i ) = T P R = 0.8644 if S i ̸ = S pr e , which meant a saccade was detected, and moreover, the detected saccade was coincident with the direction from S pr e to S i . P(E|S i ) = F P R = 0.05 if S i ̸ = S pr e , which meant a saccade was detected, but it was the wrong one. Moreover,   shows the comparison between GST detected by EOG signals (0.57 ± 0.03 s) and GST captured by eye tracker (0.43 ± 0.01 s) as a gold standard. The GST detected by EOG is significantly longer than that detected by the eye tracker (t (14) = 2.868, p = 0.012). However, it's still much faster and more accurate than using SSVEP alone (see detail in Section III-C).

C. BCI Performance in Offline Experiment
Before conducting online experiments, the performance of the proposed hybrid BCI was analyzed in offline experiments.
The data in the last five runs of the offline experiment was used for analysis. Figure 8 (a) and Figure 8 (b) illustrated the eye tracker performance as well as the BCI performance of the proposed hybrid Bayesian approach and the existing SSVEP detection methods in terms of continuous accuracy and GST for different window lengths (from 0.5s to 3.0s in steps of 0.25s). The eye tracker solution showed the highest continuous accuracy (89.64 ± 0.36%) and shortest GST (0.43 ± 0.01s). Compared to CCA, FBCCA, MEC, and PSDA, the hybrid Bayesian approach showed higher continuous accuracy and shorter GST in all window lengths. The highest continuous accuracy for the hybrid Bayesian approach was 78.41±1.74% when T = 2.0s. Two-way repeated measure ANOVA showed that there was a significant interaction between detection methods and the window length in continuous accuracy (F (2.877, 40.280) = 15.884, p < 0.01). Further one-way repeated measure ANOVA showed that the continuous accuracy was significantly affected by detection methods at all window lengths. The post hoc test showed that the hybrid approach had significantly higher accuracy than all the other detection methods when T > 1.75s. The hybrid Bayesian approach reached the shortest GST (0.89 ± 0.06s) when T = 1.5s. Two-way repeated measure ANOVA showed that there was a significant interaction between detection methods and the window length in GST (F (2.449, 34.282) = 18.510, p < 0.01). Further one-way repeated measure ANOVA showed that the GST was significantly affected by detection methods at all window lengths. The post hoc test showed that the hybrid approach had significantly shorter GST than all the other detection methods when T > 1s.
Discrete accuracy, continuous accuracy, and GST should be considered simultaneously to obtain a good performance in the online experiment. Discrete accuracy was an indicator of BCI performance when users fixated on one stimulus, and the above analysis showed that discrete accuracy reached 90% when T = 2.0s. Moreover, the highest continuous accuracy for the hybrid Bayesian approach was obtained when T = 2.0s. Furthermore, repeated measures ANOVA showed that there were no statistically significant differences in terms of GST between T = 1.5s and T = 2.0s for the hybrid Bayesian approach. Therefore, the window length of 2.0s was selected for the subsequent online BCI experiment to guarantee a good performance. Moreover, since the sum of the execution time of Source module, Signal Processing module, and Application module and time for data transmission between modules in one loop must be within 31.25 ms, and the computational cost of FBCCA was heavier than that of CCA (13.3 ms vs. 1.2 ms in MATLAB R2020b run by a PC with a 2.90 GHz CPU), the CCA was chosen as the SSVEP-BCI decoding approach for online comparison.

D. Single Trial Analysis in Offline Experiment
The transition time problem occurred in the continuous SSVEP-BCI was verified in the offline experiment. Figure 9 demonstrated decoding performance in a single trial with different window lengths. Sub-figures on the right side corresponded to conditions in which the time window was set as 0.5s, 1s, 1.5s, and 3s, respectively. Red dots in each sub-figure  were decoding results using the CCA method, and black lines were visual cues. Figure 10 illustrates changes in the probability of each stimulus in a single trial in the hybrid approach. The dotted curves indicated the probability P(S i ) interpreted by canonical correlation, and the solid curves indicated the posterior probability P(S i |E) calculated by hybrid Bayesian approach. At the beginning of the trial, the subject was asked to gaze at the up stimulus and the probability of the up stimulus P(S up ) was higher than others. Meanwhile, the posterior probabilities for all stimuli P(S i |E) were the same as the P(S i ) since no saccade event was detected. When the subject shifted one's gaze towards the left stimulus, SSVEPs evoked by the left stimulus began to accumulate in the sliding window. As a result, the probability of the new target, i.e., left stimulus P(S le f t ) rose, Online experiment performance comparison between the hybrid approach and the SSVEP (CCA) approach. and the probability of the previous target, i.e., up stimulus, declined. Even though there was a rising tendency of the new target, the probability of the previous target remained the highest among all, resulting in a period of false detection results. However, with the help of the detected saccade event and the Bayesian approach, the posterior probability of the new target was optimized to be the highest. Therefore the new target was identified more quickly.

E. BCI Performance in Online Experiment
Based on the parameters determined by offline experiments, online experiments were conducted to evaluate the performance of the proposed hybrid BCI. The accuracy and GST across all subjects in the online experiment were listed in Figure 11. The group average accuracy and SEM for CCA and the hybrid Bayesian approach were 68.86 ± 1.08% and 77.61 ± 1.36%, while the group average GST and SEM were 1.94 ± 0.08s and 0.93 ± 0.06s, respectively. Thus, the online accuracy and GST of the hybrid Bayesian approach were comparable to those obtained in offline experiments (accuracy: 78.41 ± 1.74%; GST: 0.93 ± 0.06s). Furthermore, paired t-tests revealed that the hybrid Bayesian approach significantly outperformed CCA (continuous accuracy: t (14) = 11.581, p < 0.0001; GST: t (14) = 16.320, p < 0.0001) in online experiments as well. An example of the comparison between the hybrid Bayesian approach and the CCA-based SSVEP decoding approach is available in the supplementary video.

F. Questionnaire Evaluation in Online Experiment
In online experiments, subjects were asked to answer some questions to evaluate the method used in each run subjectively. Figure 12 displays the questionnaire evaluation results. Paired t-tests showed significant differences in question 1 (How fast did the feedback update to the new target), t (14) = −8.040, p < 0.01, and question 2 (How accurate was the feedback provided when gazing at the stimulus steadily), t (14) = 4.551, p < 0.01, between CCA and the hybrid Bayesian approach. No statistically significant differences were found in other questions.

IV. DISCUSSION
In this study, we proposed a hybrid Bayesian approach that combined SSVEP and EOG to enhance the continuous BCI performance, and it's a calibration-free method. The proposed system recognized a new target more quickly during the transition state since the additional event of saccade was utilized. The proposed hybrid BCI significantly outperformed the typical SSVEP-BCI in terms of continuous accuracy (77.61 ± 1.36%vs. 68.86 ± 1.08%) and gaze-shifting time (0.93 ± 0.06s vs. 1.94 ± 0.08s). The comparable offline and online results indicated the robustness of the proposed method applied in a continuous BCI system. The slight difference between online and offline experiment results might be attributed to different subjects and feedback provided during the online experiment.

A. Transition State Problem in Continuous SSVEP-BCI
Currently, many BCI studies utilize a discrete control timeline, in which users perform one mental task during a trial with a duration of 2-4 s, and EEG segment of the entire trial is extracted for classification. Thus, the external devices receive commands and respond discretely. Some researchers start to explore the feasibility of using BCI to achieve a continuous and fluent control. There are two controversial understandings or definitions of "continuous" in BCI. On the one hand, some researchers view the "continuous" characteristics of a BCI system as the continuity of output signal in time, which means that the time interval between two output signals of a BCI system should be short so that the user's intention is estimated and delivered continuously to achieve real-time online control of external devices in practice [11], [32], [33], [34]. This continuous output signal for each time moment is in the form of class labels, e.g., a discrete class label of 1, 2, 3, or 4. On the other hand, other researchers point out that the "continuous" characteristic of a BCI system is the continuity in the value of output signals [35], [36], [37]. The output signal of a continuous BCI should be predicted by a regression approach, and the possible values are any real number within the range, e.g., the value of one-dimension output signal can be any real number in the range of [−1, 1], rather than an integer selected from total N classes as in a discrete BCI. In this study, we adopt the first idea that a continuous BCI system indicates the continuity of output signal in time (the decision time /time interval should be 40-100 ms), and the value of output signals is a discrete class label from total N classes.
In conventional continuous SSVEP-BCI, when subjects shifted their gaze to a new target, SSVEPs evoked by a new target needed time to accumulate before it could be recognized, resulting in the transition state problem. During the transition state, BCI system outputted erroneous and chaotic recognition results. Dwell time strategy [16], [28], [31] has been widely used to develop continuous SSVEP-BCI systems. The dwell time strategy was utilized by checking the consistency of consecutive decoding results, which was usually set to three, before a valid command was set out. The results of a new target must be recognized consecutive multiple times to increase output confidence. Therefore, although the dwell time strategy could stabilize the system's output and avoid random errors during the transition state, it further prolonged the system delay, thus decreasing the system response speed.
There is a trade-off between the recognition accuracy and speed of the BCI system when choosing the window length T in continuous SSVEP-BCI [17], so promoting the recognition performance under short window length could directly improve both the accuracy and the speed, and thus eliminate the transition state problem. However, the state-of-the-art training-based SSVEP detection methods such as TRCA and TDCA had insufficient recognition performance in continuous SSVEP-BCI because the extracted EEG segments are not phase-locked to each other [19], [20] (see details in Section I).
On the other hand, Wang et al. [16] deliberately enlarged the distance between stimuli and recognized the transition state based on the head movement data when users shifted between stimuli, which was unnatural and increased the task burden.
EOG signals demonstrated excellent results in detecting gaze shifts. Although eye movements, e.g., the change of gaze position, could be accurately identified using EOG signals (see Figure 7 (a)), EOG has the problem of baseline drift [38]. That is to say, EOG signals shift over time even if the subject remains gazing at the same target, making it challenging to identify the target by directly determining the exact gaze position.

B. Comparison Between Conventional BCI and the Proposed Hybrid BCI
In this study, the unsolved transition state problem in continuous SSVEP-BCI was mitigated by the proposed hybrid BCI combining both the SSVEP signals and EOG signals (saccade). In detail, when users switched their gazes to a new target, the detected saccade event served as the additional event to compute the posterior probability distribution of the targets based on Bayesian probability theory, and thus the new target can be recognized accurately. The results of both offline and online experiments verified the efficacy of the proposed hybrid BCI. Note that the algorithms for detecting SSVEP and EOG, and the Bayesian approach to combining SSVEP and EOG are all calibration-free, which is important for practical applications and has the potential for promoting the plug-andplay BCI.
Compared with existing hybrid BCI studies, this study improved BCI performance by utilizing two physiological signals without increasing the task burden. In conventional hybrid BCI combining SSVEP and EOG, users are required to perform target gazing task and attentional eye movements, e.g., blinks or winks, to output a valid command. The requirement of multiple tasks might not be natural, and it increases a user's workload. In contrast, the saccade, the EOG activity used in this study, occurs naturally and spontaneously during the target switching process and can be captured automatically by EOG signals. Users' task in the proposed hybrid BCI is the same as that in SSVEP-BCI: gaze at the desired target. Thus, the integration of EOG in this study won't increase the task burden. The subjective assessment results in this study also indicated that the proposed BCI didn't increase the task burden physically or mentally.
Besides, the response time (RT), i.e., the duration from stimulus onset to a decision making and output, of the proposed hybrid system was lower than the existing hybrid BCI studies. On the one hand, in the existing hybrid BCI combining SSVEP and EOG, the trial-by-trial structure was utilized and the EEG/EOG segment of the entire trial was extracted for classification. The RT to output a result was equal to the trial duration, e.g., 1.2s [39] and 3.9s [40]. On the other hand, in the hybrid BCI for continuous control, the task in one trial was to control the cursor to reach the destination [14], [31]. RT could not be measured and was not reported since the trajectory to the destination was not limited and thus subjects' actual control intentions at each moment were inaccessible during the control task (the stimulus onset, i.e., the moment that the subject started to gaze at desired targets, was unknown). Nevertheless, they might have a relatively long RT caused by the transition state problem (as the Introduction pointed out) since they all utilized SSVEP for continuous control and implemented the moving window strategy. In this study, subjects' intention at each moment can be inferred by the visual cue in each trial so that RT can be indicated by the gazeshifting time. RT of the SSVEP-BCI was relatively long due to the transition state problem, which was 1.94 s, and RT of the proposed hybrid BCI was significantly diminished to 0.93 s.

C. Explanation to the Experiment Results
Note that although the experiments designed in this study still utilized the trial-by-trial structure, our BCI system was different from the traditional discrete SSVEP-BCI in the following aspects: First, the aim of experimental design was different. Since the target switching process (gaze shifting between stimuli) in continuous SSVEP-BCI causes a transition state problem, a period of erroneous recognition period, the offline and online experiments in this study were designed to simulate this target switching process to explore the potential solution to the transition state problem. On the contrary, the traditional studies design the experiments to improve the decoding accuracy of the discrete BCI. Second, the task in one trial and the trial duration were different. In this study, to simulate the gaze shifting process in the continuous SSVEP-BCI, users were given two tasks in one trial, i.e., gazing at target 1 for 4-6s, and then shifting to gazing at target 2 for another 4-6s, whereas in traditional studies users were instructed to gaze at one target for 2-4s in one trial. Moreover, the duration of one trial in our study was 11-15s, which is much longer than the trial duration of 2-4s in traditional studies. Third, the signal processing method was different. In this study, the moving window approach was utilized to extract the latest 2-4s EEG segment over time to continuously estimate users' intentions, and thus multiple commands could be outputted in one trial. In traditional studies, the EEG segment of the entire trial was extracted for classification, and only one command was outputted in each trial. Lastly, the feedback was different. In the online experiment of this study, the visual feedback, i.e., the green square, was continuously updated its position according to the latest decoding results based on the preceding 2-4 s EEG data. Users received real-time feedback constantly during the trial. In the traditional studies, the feedback appeared at the end of each trial based on the EEG classification of the entire trial and thus was provided once every 2-4s to users discretely.
We evaluated the BCI performance in terms of overall continuous accuracy and GST, as shown in Figure 8 (a) and Figure 8 (b). The continuous accuracy with the change of window lengths was different from typical accuracy curves in previous discrete SSVEP studies, e.g., see Figure 6 (a). This was mainly caused by the different settings (see details in II.D) among different studies. In previous discrete SSVEP studies, the subject was instructed to gaze steadily at a particular flickering stimulus in each trial. With the increase in detection window length, the decoder gained more information to determine the subject's mental state, and therefore the accuracy was higher. In this study, the overall continuous accuracy is the evaluation of the recognition accuracy of both the transition state and the steady states before & after the transition. In detail, when the time window length is short, the BCI system could not recognize any target in such a short data segment, so the accuracy is low; with the increase of the time window, the system gains the ability to recognize the targets, so the accuracy increases also. When the time window length further increases, the system can recognize the target more accurately, so the recognition accuracy in the steady states before & after the transition state increases. Nevertheless, the increased window length also introduces a longer erroneous state / transition state into the system during the gaze shifting due to the mixture of the old target's SSVEP and the new target's SSVEP. Since the longer erroneous state has a greater impact than the performance improvement in the two steady states, the overall continuous accuracy decreases. Due to this transition state, the overall accuracy first increased and then decreased with the increase of window length T . Similar results was also found in [32], which also took the transition state into consideration when calculating the accuracy in continuous SSVEP-BCI. On the other hand, the gaze-shifting time was used to measure the delay time for the system to recognize the new target, which meant that gaze-shifting time was used to depict the length of the erroneous state. When the window length was short, the BCI system had an ideal fast response, i.e., a short gaze-shifting time, but the system could not recognize any targets in such a short window length, and therefore the gaze-shifting time was still large; with the increase of the window length, though the transition state duration was slightly longer, the overall gaze-shifting time decreased because the system was able to recognize the targets. When the system was fully able to recognize the targets accurately, further increasing the window length caused a longer transition state, and therefore the gaze-shifting time increased.
An example of the change of the overall continuous accuracy and the GST with the increase of window length is illustrated in Figure 9. When the window length T = 0.5s, data epochs were too short for the system to identify any targets even if the subjects consistently gazed at a target. As a result, the detection was chaotic, the continuous accuracy was low, and GST was high. When the window length was longer, e.g., T = 1s, data epochs were relatively long enough to decode targets, so the continuous accuracy got higher, and GST got lower. With a further increase of the window length, the system is able to recognize the targets more accurately, but the duration of the erroneous state caused by gaze shifting was also longer, leading to an overall lower continuous accuracy and higher GST.
Continuous BCI systems enable subjects to output control commands continuously without the restriction of a predefined trigger. In this study, the eye movement detection of saccades is fused with SSVEP decoding to improve the continuous BCI performance significantly. This improvement is significant in terms of accuracy and GST and is also perceived by the subjects. The fact that no significant differences between the hybrid approach and conventional SSVEP method were found in questionnaire evaluation results for questions 3-7 implied that the hybrid approach didn't increase the task burden physically and mentally. The subjective assessments of the hybrid approach to question 1 (How fast did the feedback update to the new target) were superior to that of the conventional method (See Figure 12). This implies that subjects could have a better experience in terms of control in some real applications where a quick response is preferred, such as a BCI-controlled wheelchair. Meanwhile, the average score of question 2 of the hybrid approach (How accurate was the feedback provided when gazing at the stimulus steadily) was lower than that of the conventional method. The reason might be that in the hybrid Bayesian approach, the false positive detection of saccade led to a short period of misclassification and subjects' unstable feelings. However, this short period of misclassification's influence on continuous accuracy was compensated by faster target recognition after gaze shifting. Thus both the continuous accuracy and GST showed an overall significant improvement.

D. Potential for Further Improving Performance
The system needs to be further improved to achieve practical continuous BCI control. First, the Bayesian estimation might be improved by finding the optimized probability distribution P(E|S i ), e.g., using a grid search method to find the optimized probability estimation. Second, the integration of EOG in this study helps to identify the new target more quickly during the transition state, but the use of EOG also has disadvantages of extra cost and inconvenience during the preparation stage and online use of the BCI system. Moreover, the fatigue effect after a long use time might affect the detection of saccade, and thereby the performance of the proposed hybrid BCI might deteriorate. Therefore, exploring the feasibility of using EEG channels only to solve the transition state problem is meaningful and necessary. Figure 10 shows that in the continuous SSVEP, the correlation of a new target (stimulus indicating left) rose, whereas the correlation of the previous target (stimulus indicating up) declined after gaze shifting. This potential tendency may be used to directly identify the new target and will be investigated in the future study. Third, we plan to test the proposed hybrid BCI in the case of more stimuli like six or 12, so that the system can be used for controlling external devices with multiple degree-of-freedom, like the quadcopter. Lastly, we plan to explore the idle state detection method and integrate it into the proposed hybrid BCI so that the BCI system is able to continuously and asynchronously estimate the user's intention.

V. CONCLUSION
A novel calibration-free decoding approach for continuous BCI control was proposed by hybridizing the SSVEP and EOG decoding in this paper. The erroneous state during gaze shifting is decreased through the combination of the saccade detection, which serves as an additional event in the Bayesian approach to optimize the probability distribution of commands obtained from the SSVEP detection method. Both offline and online experiments validated the feasibility of the proposed method. In offline experiments, the proposed method showed significantly higher continuous accuracy and shorter gaze-shifting time compared to the existing SSVEP detection methods. In online experiments, the proposed hybrid BCI significantly outperformed SSVEP-BCI in terms of continuous accuracy (77.61 ± 1.36% vs. 68.86 ± 1.08%) and gaze-shifting time (0.93 ± 0.06svs. 1.94 ± 0.08s). Importantly, further subjective assessments corroborated the positive results as well. In the future study, we will improve the efficacy of continuous BCI by optimizing the detection methods and will expand the proposed method into brain-controlled external devices to prove its efficacy further.