Evaluation of Optimal Stimuli for SSVEP-Based Augmented Reality Brain-Computer Interfaces

Steady-State Visually Evoked Potentials (SSVEPs) serve as one of the most robust Brain-Computer Interface (BCI) paradigms. Being an exogenous brain response, the properties of elicited SSVEPs are directly related to the properties of the visual stimuli. However, studies on integrating BCI and Augmented Reality (AR), aimed at realising mobile BCI systems, have mainly focused on applications of BCIs and performance comparison with screen-based BCIs. Little work has been done to study the effects of stimulus parameters on BCI performance when stimuli are presented with an AR headset. Here, we compare AR-based SSVEP with 3D and 2D stimuli using three different stimulation strategies: flickering, grow-shrink, and both. Participant feedback on level of fatigue and their subjective preference of stimuli were also collected. Our results did not show significant differences in classification accuracies between the 2D and 3D stimuli. However, for most of the participants, classification accuracy with flickering stimuli was above their average performance and stimuli that changed only in size were below average. The participants were divided in terms of which type of stimulus they felt was the most comfortable.

The translation of SSVEP research to practical daily life applications requires investigation of mobile systems. Traditionally, to capture objects of interest in the same field-ofview as the stimuli, either LEDs are placed around the objects or a camera is used to capture the scene and present it on a computer screen along with the SSVEP stimuli. Both of these methods are limited in the number of objects and different scenarios that can be captured and presented. To this end, Augmented Reality (AR) combined with computer vision can be employed to tag any target objects in the user's field of view by superimposing SSVEP stimuli on them. Systems based on AR usually rely on a spatially aware (through multiple cameras placed on the headset), optical see-through display to project virtual content onto the real-world space. Augmented Reality (AR) has numerous benefits for real life, as it provides users with real-time interactive experiences wherein the real-world objects are enhanced and superimposed with computer-generated information across various sensory modalities, such as auditory, visual, and haptic. Interactions of BCIs with AR devices can enable direct brain interaction with the real world through AR; such as controlling movement of a robot [27], [28], [29] or virtual objects like a game avatar [30]. AR also complements BCIs well since users can receive real-time feedback of their intention while simultaneously executing a BCI command without needing to shift their gaze [31]. This property makes the systems more intuitive to use and suited to applications such as smart home control [32], [33], [34], [35] and rehabilitation [36].
So far, AR capabilities have not been fully utilized in SSVEP-BCI studies for visual stimulation. With see-through AR headsets, a wide range of two-dimensional (2D) and three-dimensional (3D) virtual content with control over their visual properties can be projected onto the real world. Since AR technology is relatively new to the BCI field, a substantial portion of studies relating to SSVEP-based BCIs have focused on BCI performance comparisons between computer screens and AR as visual stimulators [19], [28], [31], [37]. These studies have shown the performance of AR-BCI to be comparable to computer screens and the technology to be feasible for development of SSVEP-based AR-BCI.
There remains potential to design interactive and engaging stimulation paradigms with AR displays. One example is the grow-shrink stimulus introduced by Park et al. [33] that changed both size and luminance to improve classification accuracy in their 'smart-home' BCI. Modulation of stimulus size at a fixed frequency induces Steady-State Motion Visual Evoked Potentials (SSMVEP). When combined with modulation of brightness, both SSVEP and SSMVEP are induced, resulting in higher BCI accuracy. They used 2D tiles shaped as stars for stimulation. To our knowledge, only 2D planes/tiles have been used as stimuli in AR-BCI studies so far. But when viewed through the semi-transparent AR display, planes appear translucent and the user may easily lose attention and be distracted towards surrounding objects, which they can see through the projected ones [28]. The shift in gaze from the stimulus to the object behind it would potentially affect their BCI task performance.
Screen-based SSVEP studies have explored various stereoscopic (3D) stimuli in comparison to planes/tiles (2D); however, use of holographic 3D stimuli for AR-BCI largely remains unexplored. Chien et al. [38] conducted a study on SSVEP in 3D displays on computer screens and reported that, if low disparity is maintained, stereoscopic 3D stimuli can lead to a higher degree of attention. It was observed by Mun et al. [39] that the 3D stimuli used in SSVEP-based BCI systems engaged users' attention and motivation while decreasing task response time. Another study conducted by Han et al. [40] concluded that stereoscopic motion stimulation elicits significantly higher amplitude SSVEP responses than its 2D counterpart. All of these comparison studies were conducted on computer screen displays and need to be validated with holographic virtual content. With AR displays, holographic 3D stimuli can be designed for eliciting SSVEP responses that are engaging, appear opaque through the head-mounted display (HMD), and may yield better performance compared to their 2D counterparts. Moreover, with 3D stimuli, the number of targets can be increased by anchoring stimuli spatially in 3D space. Targets separated cross-sectionally can be tagged with separate stimuli at different locations in the z-axis (distance from the user) but similar x and y coordinates (the horizontal and vertical visual planes, respectively) by appropriately adjusting their sizes to maintain the same visual angle for similar amplitude SSVEP responses [41]. Users can simply shift the focus of their gaze to select one of the targets [42], [43] located at different depths. However, no study has evaluated the BCI performance of 3D SSVEP stimuli in AR settings and it is unknown how these compare with 2D stimuli.
In this study, we compared 3D and 2D SSVEP stimuli using three different stimulation strategies for potential future BCI applications to draw comparisons between the two types of display for the HMD. We used flickering stimuli (i.e., brightness changing), size changing stimuli, and size-andbrightness changing stimuli. It has been reported in BCI user studies that the flashing of stimuli in SSVEP quickly fatigues the participants and prolonged exposure is often uncomfortable [44]. Therefore, alongside flashing stimuli, we included the stimuli that vary in size only during stimulation to compare performance of the two. User experience was collected in a post-experiment questionnaire to gauge participants' fatigue and comfort while they performed the experiment.

A. STIMULATION PARADIGM
There were six different types of stimuli used in the experiment, and for each stimulus there were two periodically changing characteristics: brightness and size. Three categories of stimuli were used with both 2D and 3D shapes: i. Flashing Stimulus (FS) that changed in brightness only, ii. Grow-Shrink Stimulus (GSS) that cycled between 1 • and 6 • in visual angle measured edge-to-edge, and iii. Grow-Shrink and Flashing Stimulus (GSFS) that simultaneously varied simultaneously in brightness and size. The GSFS exhibited maximum brightness at maximum size (6 • ) and minimum brightness at minimum size (1 • ). The visual stimulation paradigm was written using C# in the Unity 3D (Unity Technologies, USA) engine and was run on HoloLens version 2 (Microsoft Inc., Redmond, WA, USA).
Both brightness and size were modulated based on a sampled sinusoid, where s is a stimulus property (i.e., brightness and/or size) at frequency f and time t, A is the peak to peak amplitude of the stimulus, and c is the offset that determines the minimum brightness or size. For size changing stimuli, size changed in all of the defined (2D or 3D) dimensions. The stimuli shapes and layout are shown in Fig. 1. The size of FS was set to the mean of the maximum and minimum cross-sectional areas of GSFS for both 2D and 3D FS, which was 4.46 • in visual angle.

B. PARTICIPANTS
Twelve healthy participants (7 females, 5 males) with normal or corrected-to-normal (with glasses or contact lenses) vision volunteered to take part in this research study. This study was approved by the University of Melbourne Human Research Ethics Committee (Approval Number 2057895). Signed written consent was obtained from each participant prior to commencement.

C. TIMING AND FREQUENCY
Five integer stimulus frequencies, 12, 13, 14, 15, 16 Hz, were used. We deliberately avoided frequencies in the lower alpha band (8)(9)(10)(11) to avoid the influence of spontaneous alpha activity. The five stimuli were equidistantly placed on a transparent circle that measured 10 • in diameter with their centers lying on the circumference, as shown in Fig. 1(a). The center of this circle was fixed to the center of the participant's field of view (FOV) at the start of the experiment and remained at the same position throughout the experiment. This configuration was chosen to make the experimental setting closer to a realworld scenario, where objects are usually stationary. Four trials were recorded for each frequency, totaling 20 trials for each type of stimulus. All participants were tested for the six types of stimuli, as illustrated in Fig. 1(c-h), which totaled 120 trials recorded from each participant. The timing of each trial is shown in Fig. 2(a). All the trials and blocks were randomised across participants. The participant was shown a cue, a small red sphere, that appeared for 0.75 s at the position of the target stimulus ( Fig. 1(b)). Participants were asked to keep looking at the cue location, which was then replaced with a white stimulus shown for 5 s concurrently with the other four non-target stimuli. After the stimulus presentation finished, auditory feedback (duration 0.15 s) was provided to participants: a short melody was played if the target frequency was correctly detected in the EEG signal and a buzzing sound was played if the target frequency was not correctly detected. The purpose of the feedback was to retain user engagement and help them focus on the experimental task. A rest period of 2 s followed the feedback. Fig. 2(b) shows the time distribution across the different blocks of the experiment. The experiment was split into six blocks to provide regular breaks. During each block, all the trials of only one type of stimulus were presented. The sequence of stimuli blocks and target frequencies in trials were both randomised for each participant to minimise order effects. A break of 1.5 min was provided between consecutive blocks ( Fig. 2(b)).

D. HARDWARE SETUP AND SYSTEM ARCHITECTURE
EEG was recorded in a Faraday shielded room with g.USBamp and g.Sahara dry electrodes (g.tec medical engineering GmbH, Austria) sampled at 512 Hz. A notch filter at 50 Hz for removal of line-noise and a bandpass filter with 0.5-60 Hz pass band were applied in the g.USBamp's data acquisition software package during EEG recording. EEG was measured at six electrode sites according to the 10-20 international system: PO3, POz, PO4, O1, Oz and O2. Long leg electrodes were used for participants with thick hair and short leg electrodes for others to ensure good contact with the skin. Reference and ground were placed on the right mastoid and left mastoid, respectively, using adhesive electrodes.
Participants sat approximately 1 m from a black background, measured at their eye level (Fig. 2(c)). After fitting the EEG cap and electrodes, participants wore the HoloLens over the top of the EEG cap and electrodes. Foam padding was inserted at the sides to prevent the HoloLens from pressing onto the electrodes. It was ensured that participants were comfortable throughout the experiment by verbally asking them during the breaks.
SSVEP stimuli were projected using the HoloLens, rendered at a frame rate of 60 Hz. To ensure that the augmented projections of stimuli were presented at the intended locations, the HoloLens was calibrated to each participant's eyes using a built-in calibration routine. Event triggers were sent to a Windows PC from HoloLens as UDP packets via Wi-Fi and were received in a Simulink (MathWorks Inc., USA) model recording the EEG.

E. QUESTIONNAIRE
At the end of the experiment, participants were asked to complete a questionnaire to report their subjective evaluation of fatigue and experience wearing the HoloLens and EEG dry electrodes for the duration of the experiment. The questions asked were: 1) Was the flickering of the stimulus annoying? 2) Was the flickering of the stimulus fatiguing? 3) How strenuous was the experimental task using the HoloLens device? 4) Do you feel any discomfort in the eyes? 5) Did you feel dizzy? 6) Which stimulus were you the most comfortable with? 7) Would you be comfortable using the HoloLens if the experiment extended for more than an hour? Questions 4, 5, and 7 required a binary response while the other questions' responses were recorded on a five-point scale as shown in Table 1.

F. DATA ANALYSES 1) ONLINE PROCESSING
During the experiment, event triggers sent via UDP to the MATLAB Simulink model identified the five second EEG segments related to the SSVEP response. Each segment was stored in a buffer and decoded using Canonical Correlation Analysis (CCA) [45] at the end of the stimulation period to provide online auditory feedback to the participant during the experiment. All six electrode channels were used for online decoding.

2) OFFLINE PROCESSING
Epochs of 5 s duration of EEG corresponding to stimulation periods were extracted using the event triggers that labelled the start and end of the stimulation. The data were band-pass filtered with a pass-band between 6 and 60 Hz using the 'bandpass' function in MATLAB with 'ImpulseResponse' set to 'auto', 'Steepness' set to 0.85, and 'StopbandAttenuation' set to 60 dB. During the post-processing of EEG data, unexpected 15 Hz and 30 Hz noise were observed on some channels that substantially reduced the classification accuracy, as 15 Hz is also a stimulation frequency tested in this experiment. The channels with this noise were not consistent amongst participants. To avoid the effect of this noise, a combination of three optimal electrodes was determined for each participant using the approach adopted by Park et al. [33] to remove the majority, if not all, channels that were contaminated: for each participant, classification accuracy for all possible combination of three electrodes was calculated, the electrode combination that yielded highest classification accuracy was selected as the optimal electrode combination and used for offline analysis.

3) CANONICAL CORRELATION ANALYSIS
Canonical Correlation Analysis (CCA) was used for decoding. For SSVEP classification, the two inputs for the CCA algorithm were the processed EEG trial data and a set of reference signals Y f composed of sine and cosine waves of the fundamental stimulation frequency f and its harmonics [45], . . .

sin(2πn h ft) cos(2πn h ft)
where n h is the number of harmonics included in the reference set. The CCA algorithm determines a set of linear combinations of the two inputs such that the correlation between them is maximised. This process was repeated using a reference signal set for each frequency and the correlation coefficient was determined. The frequency yielding the highest correlation with the EEG trial data was selected as the frequency decoded for that trial. Performance evaluation metrics were calculated from results of performing CCA on all trials.

G. PERFORMANCE EVALUATION
Performance in this study was evaluated using target Classification Accuracy (CA) and Information Transfer Rate (ITR). The ITR, B, as defined by McFarland and Wolpaw [46] is: in bits/min, where N is the total number of possible outcomes, P is the probability of selecting the desirable output (i.e., classifier accuracy), and T is the total time required to make a selection.

H. STATISTICAL ANALYSIS
The Kolmogorov-Smirnov test confirmed that participants' classification accuracies followed a normal distribution. A Linear Mixed Effects (LME) model was fitted on classification accuracies with a fixed effect of stimulus and a random effect of participant to capture the variability within the participants and assess the overall effect of type of stimulus on BCI performance. One-way ANOVA was performed on the LME model and subsequent pairwise comparisons were carried out to investigate the differences between individual stimuli for statistical significance, with Tukey adjustment for multiple comparisons.

III. RESULTS
For offline analysis, the combination of three electrodes optimised by classification accuracy was used. Table 2 lists the best combination of three channels identified based on the CA for each participant. Each participant's classification accuracy with all types of stimuli are plotted in Fig. 3. The plot highlights both inter-participant and intra-participant variability. For each participant, the stimulus they reported as the most comfortable to view in the questionnaire (Table 4) is marked with a cross ('x') on the plot. High accuracy for a stimulus type corresponds with the preferred choice of stimulus for some participants, such as Participants 6 and 12, but did not match for most of the participants. For the majority of the participants, classification accuracy for GSS stimulus was lower than their performance for other stimuli. The average CA of each participant was also evaluated (Fig. 4), which showed large variation between the participants. Fig. 5 shows the differences of each participant's average classification accuracy for each type of stimuli from their own mean performance. For both 3D and 2D FS, most participants performed better than their average accuracy, while CA was below average for the majority of the participants with 3D and 2D GSS.   To avoid introduction of a bias in statistical tests, Participant 1 was removed from statistical analyses since their classification accuracy was only at the chance level (20%), as shown in Fig. 4 (dashed line), and so was considered an outlier.
To evaluate the performance of the different types of stimuli, average classification accuracy of each stimulus over all trials was calculated for Participants 2-12 (Fig. 6). 3D GSFS, 3D FS, and 2D FS yielded the highest accuracies. Classifi- cation accuracies for 2D and 3D GSS were lower than other types of stimuli. One-way ANOVA of classification accuracy with participants and stimuli as factors showed no significant differences between stimuli (F(5,50) = 2.13, p = 0.077). Subsequent post-hoc pairwise comparisons performed on the linear mixed model of the classification accuracies, where participants were kept as a random effect, also did not reveal   any significant differences between the six types of stimuli or any of the 2D and 3D pairs (Table 3). However, upon carefully reading the p-values from multiple comparisons, GSS yields much lower p-values when compared with FS than other pairwise comparisons.
Frequency is an important factor that affects the performance of SSVEP-BCI. In Fig. 7, the average classification accuracies of all five frequencies yielded by each stimulation strategy are plotted. As found in previous work, accuracy tended to decline with increasing frequency [47], [48], [49], [50]. Time taken to accurately decode an EEG signal and produce a system outcome indicates the potential response rate and practicality of a BCI system. For evaluation of the optimal stimulation time for each type of stimulus, the average CA was calculated for all participants using time windows of 0.5 s to 5 s in steps of 0.5 s (Fig. 8). The results show that the 2D and 3D pairs of GSS and GSFS both yielded approximately the same classification accuracies for all window lengths except 0.5 s. CA for 2D-FS increased at a steeper rate than other types and it yielded the same average accuracy at 3 s as 2D-GSS and 3D-GSS showed at 5 s. After 3.5 s, both 3D and 2D-FS had similar CA. Overall, 2D-FS achieved the best classification accuracy and ITR for most window lengths while GSS was the poorest.
The responses of the post-experiment questionnaire are tabulated in Table 4. In the questionnaire, five out of the 12 participants reported 2D or 3D GSS as their preferred choice of stimulus. Similarly, five out of 12 participants reported FS and two reported 3D GSFS as their preferred stimulus. No participant marked 2D GSFS as the stimulus of their choice in terms of visual comfort. Furthermore, eight out of 12 participants reported post-experiment fatigue (responses 3-5) while the remaining four did not report experiment-related fatigue (responses 1-2). Except for Participant 11, no participant agreed to wear the dry electrodes and HoloLens for any longer than the duration of the experiment, which ranged from 50-70 minutes.

A. PERFORMANCE COMPARISON OF 2D AND 3D STIMULI
The performance of three types of 2D and 3D stimuli were evaluated in this study. For a given stimulation paradigm, both 2D and 3D stimuli yielded similar performance for all time windows of classification accuracies (CA) and information transfer rate (ITR). Slightly larger differences in performance VOLUME 11, 2023 87311 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. were observed for the Flashing Stimulus (FS) in terms of both CA and ITR. 2D-FS yielded higher average CA than 3D-FS for time intervals of less than 4 s. Similarly, for window lengths of less than 2.5 s with Grow-Shrink and Flashing Stimulus (GSFS), 2D-GSFS showed higher CA than 3D-GSFS. However, these differences were not statistically significant. We speculate that this may be because the 2D and 3D stimuli both cover the same area in the visual field, which has higher impact on the evoked response as compared to adding the third dimension in the stimuli and, hence, the visually evoked responses do not differ significantly. It is also not evident from questionnaire responses whether 2D or 3D was consistently considered more comfortable than the other amongst participants. However, we have demonstrated in this experiment that both 2D and 3D variants of a particular strategy would yield similar BCI performance.

B. PERFORMANCE COMPARISON OF STIMULATION STRATEGIES
In terms of comparison of three stimulation strategy, CA is consistently lower for stimuli that change in size only (Grow-Shrink Stimulus, GSS). GSS elicits Steady-State Motion Visual Evoked Potentials (SSMVEP) only and while it was the preferred choice for 40% of the participants, the evoked response was less accurately decoded. In contrast, the decoding results from FS and GSFS were comparable to one another, and no significant differences were found between the two, contrary to the results reported by Park et al. [33] who reported GSFS to perform significantly better than FS.
The overall mean accuracy in this study was also lower than the values reported by Park et al. The discrepancy observed in our experiment from Park et al.'s could be due to the use of dry electrodes and the layout of stimuli and absence of a fixation point. In a study with 102 participants, Zhu et al. [51] showed that CA for the same stimulation conditions in SSVEP using dry electrodes can differ by up to 20% from wet electrodes. In this experiment, the overall average accuracy for GSFS at 5 s of stimulation was 75.2% compared to 92.8% achieved by Park et al., in accordance with Zhu et al.'s conclusion.
Another SSVEP-BCI system tested by Farmaki et al. [52] who used dry electrodes for recording three channel SSVEP reported an average accuracy of 80.2% and lowest accuracy of 46%. Although, dry electrodes facilitate easy placement and removal of EEG electrodes, the technology still requires improvement to match the output of wet electrodes.
Secondly, the placement and composition of targets also affect accuracy. In our study, the stimuli were placed equidistantly on a circle. For a size changing stimulus, the motion of the stimulus is perceived as the adjacent stimuli grow and shrink in size. Distraction caused by a moving object in the periphery of the target stimulus is greater for the participants compared to a stationary flickering object, especially when there is no focus point to direct their attention towards the target during stimulation. The moving adjacent stimuli appear to be coming closer and moving further away from the target and may divert the participant's attention. The superimposed pictures of home appliances on stimuli in Park et al.'s study also act as anchoring points for participant attention, thereby enhancing performance for GSFS. SSVEP-AR studies that report high classification accuracy are application-based meaning the different stimuli presented are associated with different commands. For example, Ke et al. [28] used eight SSVEP targets to control a robotic arm similar to Zhang et al. [27] who both reported accuracies above 90%. Associating a stimulus with a task can help user retain their attention.

C. QUESTIONNAIRE RESULTS
An important finding from the questionnaire was the reluctance of participants to wear the head-mounted display on top of dry electrodes for longer duration. This response was independent of experiment-induced fatigue. Participants who did not report fatigue also disapproved of wearing the headset for long duration. The bulkiness of the HoloLens and the shape of dry electrodes both contribute to this. As the dry electrodes make contact with the scalp through thin cylindrical legs, a small amount of force pushing onto the electrodes translates into large pressure at the back of the head leading to discomfort. Although foam padding was inserted during the experiment to create a gap between the headset and the electrodes, the overall experience was still unpleasant for the majority of the participants. This indicates that the combination of dry electrodes with head mounted displays is not ideal for a long duration and wet electrodes may be better suited.

D. EFFECT OF FREQUENCY
In terms of CA, 3D GSFS and 3D FS were most stable amongst all stimuli and yielded consistent performance for 12-15 Hz. When evaluated for each stimulation frequency, CA varied considerably. As observed in Fig. 7, average CA decreased as the frequency increased except at 15 Hz where the CA rose and had almost the same average value across all the stimuli. It is worth noting that 15 Hz is also a dividend of the Hololens's refresh rate of (60 Hz), which may have impacted the EEG recording. However, studies on SSVEP frequencies with stimulation frequencies in the range of 12-18 Hz have shown that a local maximum in the EEG power distribution is typically observed around 15 Hz [13], [53]. Classification accuracy at the maximal frequency of 16 Hz was lowest for all stimuli following the decreasing pattern of accuracy with increasing frequency. Our results are consistent with the studies in the literature who also reported decrease in CA or signal-to-noise ratio for increasing stimulation frequency beyond 10 Hz [47], [48], [49], [50].

E. FUTURE WORK
Using AR displays, 3D stimuli can be laid out and anchored in 3D space to increase the number of stimuli presented simultaneously. Future experiments could explore a presentation setup where stimuli are anchored at different viewing distances from the participant. Previous experiments with a set of LEDs have shown that the two SSVEP targets placed at different depths in a single direction of view can elicit distinguishable cortical responses [42]. By spacing out stimuli in all three dimensions of space, the number of simultaneous targets can be increased. Multiple colors could be used to improve discernability between targets hence improving the classification accuracy.
One of the limitations in this study was the use of integer frequencies ranging to the higher end frequencies that yielded lower accuracies. The performance can be further improved by testing multiple frequencies (integer and non-integer values) to identify a set of frequencies that yields the strongest response and higher accuracy.

V. CONCLUSION
Stimuli properties determine the strength and quality of exogenous brain responses. The main advantages of AR technology for SSVEP-based BCIs are system portability and incorporation with a person's surroundings. The combination of the two requires an engaging stimulation paradigm within concise layouts. In this study, we evaluated the use of dry electrodes and an optically see-through head mounted display, finding that both 3D and 2D single graphic SSVEP stimuli yielded similar participant performance and may be used for designing BCI experiments. However, for stimulation periods of less than 3.5 s, flickering stimulus gave higher accuracy and information transfer rate.