SSVEP Stimulus Layout Effect on Accuracy of Brain-Computer Interfaces in Augmented Reality Glasses

Steady-state visual evoked potentials-based brain-computer interfaces (SSVEP-BCI) has the advantage of high information transfer rate (ITR) and little user training, and it has a high application value in the field of disability assistance and human-computer interaction. Generally SSVEP-BCI requires a personal computer screen (PC) to display several repetitive visual stimuli for inducing the SSVEP response, which reduces its portability and flexibility. Using augmented reality (AR) glasses worn on the head to display the repetitive visual stimuli could solve the above drawbacks, but whether it could achieve the same accuracy as PC screen in the case of reduced brightness and increased interference is unknown. In current study, we firstly designed 4 stimulus layouts and displayed them with Microsoft HoloLens (AR-SSVEP) glasses, comparison analysis showed that the classification accuracies are influenced by the stimulus layout when the stimulus duration is less than 3s. When the stimulus duration exceeds 3s, there is no significant accuracy difference between the 4 layouts. Then we designed a similar experimental paradigm on PC screen (PC-SSVEP) based on the best layout of AR. Classification results showed that AR-SSVEP achieved similar accuracy with PC-SSVEP when the stimulus duration is more than 3s, but when the stimulus duration is less than 2s, the accuracy of AR-SSVEP is lower than PC-SSVEP. Brain topological analysis indicated that the spatial distribution of SSVEP responses is similar, both of which are strongest in the occipital region. Current study indicated that stimulus layout is a key factor when building SSVEP-BCI with AR glasses, especially when the stimulation time is short.


I. INTRODUCTION
In recent years, brain-computer interfaces (BCI) based on steady-state visual evoked potentials (SSVEP) has attracted a lot of attention due to its high information transfer rate (ITR) and little user training [1]- [3]. It could be used in smart home appliances [4], [5], disability assistance [6]- [9], human-computer interaction [10]- [12], games and entertainment [13], [14] and other fields. SSVEP is evoked by repetitive stimulus with a constant frequency on the central retina, SSVEP-BCI could detect which stimuli the user is The associate editor coordinating the review of this manuscript and approving it for publication was Emil Jovanov. gazing from a set of different frequency flicker stimuli by using frequency recognition algorithms [15], [16]. Generally, the repetitive visual stimulus are rendered by personal computer screen (PC) or light-emitting diode (LED) light [17], and the position of the stimulator is often fixed and inconvenient to move. It reduces the portability and flexibility of SSVEP-BCI, making users often have to sit or stand still to complete the interactive tasks, which greatly limits the SSVEP-BCI application in the area of human-computer interaction.
Combing augmented reality (AR) technologies [18] and BCI, namely AR-BCI, could improve the portability and flexibility of SSVEP-BCI, because the repetitive visual stimulus could be displayed in AR glasses worn on the head. According to the type of AR technology used in the BCI system, they can be divided into video see-through (VST) AR and optical see-through (OST) AR [19], [20]. VST-AR system includes a method of acquiring real images using a camera or a computer and combining the real-time display of virtual pictures, and several researchers have developed BCI systems rely on VST-AR [21]- [26]. However, the leading development direction of AR technology is OST-AR. Head-mounted displays (HMDs) are preferred methods for OST-AR techniques, it blends real with virtual through creating a scene generator that combines video with graphic images. An OST HMD works by placing optical combiners in front of the user's eyes [27], users can see the real world directly and virtual images bounced off the combiners from half-silvered monitors. Displaying repetitive visual stimulus via OST-AR technology could overcome the limitations of the vision of PC screen, and improve the portability and flexibility of SSVEP-BCI simultaneously. In 2018, Hakim Si-Mohammed et al. verified that it is feasible of combining OST-HMD with BCI through designing and evaluating the application prototype of AR-BCI system [28]. Ming et al designed SSVEP-BCI with OST-AR and achieved 8 commands robotic arm control, their results showed that longer stimulation time is required to guarantee higher accuracy for OST-AR compared with PC screen due to the weakened SSVEP responses evoked by the transparency of stimulation [29].
The number of AR-BCI studies is much less than that of PC-BCI until now, indicating that AR-BCI is still in the stage of beginning. As far as we know, there are few practical applications of SSVEP-BCI based on OST-AR, and the reason may be that the research conclusions of PC-BCI cannot be directly transplanted into AR-BCI. OST-AR uses holographic projection [30], the brightness of the stimulus is weaker than PC screen, while the interference between the stimulus is stronger than PC screen. In addition, the repetitive stimulus which the subject is gazing at may also be disturbed by the light in the real environment when subject wears the AR glasses, thus the classification accuracy of BCI may be decreased. In our opinion, displaying the repetitive stimulus in OST-AR glasses could significantly expand the application area of SSVEP-BCI, but we need to do a lot of careful comparison studies between PC-BCI and AR-BCI at present.
In current study, we (1) designed 4 different display layouts in AR glasses, verified the influence of different layouts by comparing the changes of classification accuracy with different display layouts and explained the reasons why different layouts affected classification performance. (2) Used a PC screen and Microsoft HoloLens (AR) glasses to set up similar experimental environment to display the SSVEP stimuli, and compared the differences between them from the perspective of classification accuracy, power spectrum and brain topographic map. The results of current study have a guiding effect on designing SSVEP-BCI in AR environment.

A. SUBJECT AND EXPERIMENTAL ENVIRONMENT
10 healthy subjects (2 females, 8 males, aged 21-26 years) volunteered to participate in the experiment, and all of them were with normal or corrected to normal vision. Participants must read and complete the informed consent form before the experiment. Electroencephalogram (EEG) was recorded in an electrically shielded cabin. Stimulus presentation and recording computer were outside of the recording room.
The EEG signals were acquired by the SynAmps2 amplifiers (Neuroscan Instrument, USA). A total of 64 electrodes were selected to record EEG signals, and they were placed by the international standard 10-20 system. All electrodes were referenced to the AFz electrode, and the impedances were kept below 10 k during recording. The EEG signals were sampled at 1000 Hz and filtered between 0.5 and 45 Hz. Fig. 1 showed the environment used in this experiment, the hardware device composed of EEG acquisition and HoloLens [31]. The HoloLens (Microsoft Corporation, Washington) is an OST-HMD device that can overlay virtual objects onto the real-world surroundings of the user. HoloLens is a complete AR system, a custom-designed holographic processing unit, see-through optical lenses with a holographic projector, it has a fixed focal of 2m length, and the background is transparent [32].

B. EXPERIMENT SETUP
Repetitive visual stimuli (SSVEP targets) were displayed on a PC screen and Microsoft HoloLens (AR), and both of them were set to a 60 Hz frame rate with a screen resolution of 1280 * 720, and the other experimental conditions remained consistent. In order to analyze the difference between the offline classification results of PC-SSVEP and AR-SSVEP and to evaluate the influence of different stimulus layouts of AR-SSVEP on the classification results, the two stimulation paradigms were designed as follows: In the AR-SSVEP stimulation experiment, participants sat in the shielded room with HoloLens. Four white rectangles [33] were presented as stimuli on the AR screen (see    a welcome message after a blank screen. There are 30 trials at each flashing frequency, and 120 trials in a run. For each trial, the subjects were firstly instructed to gaze at the target stimuli during the ready period, then the target stimuli flashed for 4 seconds, and 3 seconds break followed at last, as shown in Fig. 4. They were advised to avoid unnecessary blink and eye movement while gazing at any of the four stimuli. In order to study the layout effect on SSVEP classification performance, we symmetrically moved the flashing rectangles from the middle to both sides (AR-Position1 to AR-Position4) in steps of 128 pixels (3 • in the visual angle), as shown in Fig.  3. Totally, 1 run consisting of 120 trials were recorded for each of the 4 positions. Half of the subjects participated in the experiment from AR-Postition1 to AR-Postition4, and the order of the remaining half were reversed for reducing the impact of visual fatigue on the accuracy of different stimulus layouts. Subjects selected the most comfortable stimulus layout at the end of the experiment.
In the PC-SSVEP experiment, participants were placed in the same shielded room at 60 cm and 200 cm from the screen, i.e. PC-60 or PC-200. The experimental paradigm settings were similar with AR-SSVEP, and the four flashing rectangles were arranged in accordance with AR-Position 2, as shown in Fig. 5. Totally, 120 trials were recorded for PC-60 or PC-200, respectively. C. DATA PREPROCESSING EEG was segmented using the stimuli markers which labeled the start and end of the flickering. The trend in the segmented data was removed and the data were filtered with a bandpass filter with cut-off frequencies of 5 and 40 Hz in order to remove DC component and high-frequency artifacts including power line noise (50 Hz). 9 channels EEG, including Oz, O1, O2, Pz, POz, PO3, PO4, PO7 and PO8, were selected for SSVEP recognition.

D. CANONICAL CORRELATION ANALYSIS
As a multivariate statistical method, canonical correlation analysis (CCA) explores the underlying correlation between two sets of data. Given two sets of random variables X ∈ R I 1 ×J and Y ∈ R I 2 ×J , which are normalized to have zero mean and unit variance, CCA is to seek a pair of linear transforms w x ∈R I 1 and w y ∈ R I 2 such that the correlation between linear combinationsx = w T The maximum of correlation coefficient ρ with respect to w x and w y is the maximum canonical correlation.
A CCA-based frequency recognition method was first introduced by Lin et al. to SSVEP-based BCI [15]. The CCA method provided better recognition performance than that of the power spectral density (PSD) analysis since it delivered an optimization for the combination of multiple channels to improve the signal-to-noise ratio. Assume our aim is to recognize the target frequency (i.e., SSVEP frequency) from M stimulus frequencies in an SSVEP-based BCI.X ∈ R C×P (C channels ×P time points) is a test data set consisted of EEG signals from C channels with P time points in each channel. Y ∈ R 2H ×P is a pre-constructed reference signal set at the m-th stimulus frequency f m (m = 1, 2, . . . , M ) and is formed by a series of sine cosine waves as where H is the number of harmonics and F denotes the sampling rate, H was set as 1 and F was set as 1000 in current study. Solving the maximal correlation coefficient ρ m betweenX and Y m (m = 1, 2, · · · , M ) by (1), the SSVEP frequency is then recognized bŷ

E. POWER SPECTRUM DENSITY ESTIMATION
Characteristics of the acquired EEG signal are computed by PSD estimation in order to selectively represent SSVEP response. First, the channels for PSD analysis were selected by the method proposed in [34]. Fast fourier transform [35] was used with hanning window to calculate the power spectrum of the preprocessed EEG by Oz channel for a 4s stimuli length. Furthermore, in order to observe the topographical distribution of the power of different frequency responses, the PSD of all the 64 channel EEG signals are calculated, and the topographical distribution of the base frequency response are also acquired.

A. INFLUENCE OF STIMULUS LAYOUT ON AR-SSVEP CLASSIFICATION
We calculated the classification accuracy under different time windows, which all started from the onset time of the stimulus, but the epoch length is different. As shown in Fig. 6, as the length of the time window increases, the classification accuracy of the 10 subjects gradually increased. The accuracies varied slightly from layout to layout. If we set a classification accuracy of 90% as the threshold, the number of subjects who reached the threshold at AR-Pos2 was higher than the other 3 positions when the time window was 1 s, 2 s, and 3 s respectively.  Fig. 7 showed the average classification accuracy of the 10 subjects with the time window length from 0.5 s to 4 s, and paired t-test was used to test whether the accuracies obtained from the 4 AR stimulus positions are significantly different. It showed that classification accuracy increased as the time window length increased until it reached a ceiling at 3 s. The average classification accuracy of AR-Pos2 was significantly higher than AR-Pos1 (p<0.05) when the time window length was 0.5 s and 1 s. When the time window was 1.5 s and 2 s, AR-Pos2 had significantly higher accuracy than AR-Pos4 (p<0.05). And when the time window length was longer than 3 s, there was no significant difference between the 4 positions. Overall, AR-Pos2 achieved the best classification performance, and the average accuracies were 74.6%, 89.0%, 94.6% and 95.6% on time window lengths of 1, 2, 3 and 4 s, respectively.
The above results were consistent with the subjective feelings of the participants. 7 of 10 participants selected AR-Pos2 to be the most appropriate layout in the final survey. All the 10 subjects reported that the stimulus distribution of AR-Pos1 has a great irritating effect on the eyes. Fig. 8 showed the PSD on channel Oz of two representative subjects with the 4 s time window length. The repetitive stimuli at all 4 frequencies induced a strong SSVEP response, and there was no significant difference in the power values of the 4 stimuli positions.
Then, the SSVEP power distributions of the 4 positions were analyzed. As can be seen from Fig. 9, the SSVEP power of each position was mainly located in the occipital region. Among the 4 positions, the power value of AR-Pos2 was relatively higher.

B. COMPARISON OF CLASSIFICATION RESULTS BETWEEN PC-SSVEP AND AR-SSVEP
According to the results of the above section, we selected AR-Pos2 to represent the AR-SSVEP, and compared its performance with two kinds of PC-SSVEP settings. The classification accuracy of different time window length was shown in Fig. 10. For all the 3 conditions, the classification accuracies of 10 subjects gradually increased with the increase of SSVEP segment length. The classification accuracies of PC-60 were higher than AR-SSVEP and PC-200 for most of the subjects. The average accuracy was then calculated from the 10 subjects, as shown in Fig. 10(B). The average classification accuracy of PC-60 tended to be stable when the time window length is 2 s (accuracy = 92.0%), and for the AR-Pos2 condition, the average accuracy tended to be stable when the time window length is 3 s (accuracy = 93.5%). The lowest average accuracy was achieved by the PC-200 paradigm. Paired t-test was performed between PC-60, AR-Pos2, and PC-200, respectively. The results showed that the accuracy of the PC-60 was significantly higher than AR-Pos2 (p<0.05) and PC-200 (p<0.01) when the time window length is 0.5 s. There was no significant difference between PC-60 and AR-Pos2 (p>0.05) when the time window length is longer than 1 s. Both PC-60 and AR-Pos2 achieved significantly higher accuracy than PC-200 (p<0.05). The mean ITRs of different time window length were further calculated and listed in Table 1.  The power spectra of each channel under the 4 frequency stimuli were further calculated, and the topographic maps of the averaged SSVEP response power spectrum were shown in Fig. 12. It revealed that the topographic maps of the three paradigms are similar, and an obvious increase in the power value on the occipital region could be observed for all of them. When using the mean value of O1, O2 and Oz channels to denote the SSVEP response of the occipital region, the occipital SSVEP response of PC-60 was significantly stronger than that of AR-Pos2 and PC-200 (p<0.001), and there was no significant difference between AR-Pos2 and PC-200 (p = 0.245).

IV. DISCUSSIONS A. FEASIBILITY FROM PC-SSVEP TO AR-SSVEP
The above results proved that it is feasible to perform SSVEP experiments with OST-AR. If the flashing stimulus duration is more than 1 s, there is no significant difference in the 5994 VOLUME 8, 2020  classification accuracy between AR-Pos2 and PC-60. When the stimulus duration reached 3 s, the average classification accuracy difference between the above two paradigms is only 0.68%. PSD and topographic analysis further confirmed that both PC-60 and AR-Pos2 paradigms could induce stronger SSVEP response on occipital region as same as the stimulus frequency.
At the same time, it was found that when the stimulation time length is short, the classification accuracy of SSVEP induced by HoloLens was lower than that of a PC screen, which may be affected by the background environment interference in AR field and the degree of adaptation for HoloLens. The above results are consistent with Ming et al's study [29], in their experimental results, the accuracy difference between AR-SSVEP and PC-SSVEP increases as the stimulation time decreases when the stimulation time is less than 2 s. Previous study found that the SSVEP response is stronger when the contrast of the flashing stimulus is larger under the black background [36]. Compared with PC screen, the size of the stimulus in HoloLens is smaller, and the contrast is weaker, it may be the reason why strong SSVEP response cannot be obtained under short time stimulation. In practical applications, the environmental interference that appears in the AR glasses may also affect the response of SSVEP, thus we recommend stimulus duration more than 2s in order to ensure higher accuracy.
The classification accuracy of PC-200 is lower than that of PC-60 and AR-Pos2 for most of the subjects. Compared with PC-200, the contrast of the stimulus is same, but the size of the stimulus in the field of view becomes smaller as the viewing distance increases, and this may result in a weakened SSVEP response. There are several studies support our explanation [37], [38], and it has been reported that the size of the stimulus is the most important parameter affecting the SSVEP classification accuracy [39].

B. INFLUENCE OF STIMULUS DURATION AND LAYOUT ON THE CLASSIFICATION PERFORMANCE OF AR-SSVEP
When the stimulus duration is less than 3s, AR-Pos2 achieved the highest classification accuracy compared to the other 3 positions, and 89.0% and 94.6% average accuracy could be obtained under the stimulus duration of 2s and 3s. Compared with the existing AR-BCI study [28], this study obtained a higher recognition accuracy by a shorter stimulus duration.
Why AR-Pos2 achieved the best SSVEP recognition accuracy among the 4 stimulus layouts? The following two points are considered the possible reasons for explaining the above results: the angle between the eye and the stimulus, and the distance between the stimuli. (1) The horizontal field of view of the HoloLens is 30 degrees. When the same stimulus rectangles move from AR-Pos1 to AR-Pos4 horizontally, it causes a nearly 3 degrees change in the gaze direction for each move. The power energy of SSVEP response decreases as the horizontal angle increases [40]. It is assumed that the reduction of power value was the only cause of the difference accuracies of the 4 positions, the order of the average accuracies of the 4 positions should be AR-Pos1 > AR-Pos2 > AR-Pos3 > AR-Pos4. However, the actual order obtained in current study is AR-Pos2 > AR-Pos1 > AR-Pos3 > AR-Pos4, thus it is not reasonable to consider the horizontal angle alone. (2) Competing stimuli make a significant suppressive effect on the dominant frequency response [41]. The centersurround structure of the receptive field in visual cortex [42] indicates that there exist an inhibitory region surrounding the excitatory region at a relatively small distance. As the stimuli rectangles are immediately next to each other in AR-Pos1, each stimulus maybe within the inhibitory surround region of the other, and the similar mutual inhibition effect between stimuli could be also find in J. Mu et al' s study [43]. The mutual inhibition effect suggests that the distance between two adjacent visual stimuli should be selected carefully, especially when the stimulus time is short. Compared with PC-SSVEP, the mutual inhibition effect in AR-SSVEP is more obvious because we cannot switch the gaze point by turning the eyeball or moving the head.
When the stimulus duration is longer than 3s, there is no significant difference in the average classification accuracy between the 4 positions, and the maximal difference in average accuracies between them is only 1.89%. The SSVEP power spectrum on Oz channel in Fig. 11 also conforms to this result. Meanwhile, from the comparison of the classification accuracy of PC-60 and AR-Pos2, it can be seen that the classification accuracy of AR-Pos2 is close to stable when the stimulus duration reaches 3s. Therefore, the stimulus layout has no significant effect on the classification performance of AR-SSVEP when selecting a longer stimulation time.
In general, AR-SSVEP is more sensitive to the stimulus duration and layout compared with PC-SSVEP, if the experimental configuration is not appropriate, the SSVEP recognition accuracy will not be high. Longer stimulus time could guarantee a high recognition accuracy, but it may reduce the ITR. When the recognition accuracy is prioritized, the selection of the stimulus duration is also affected by the spacing between the adjacent stimuli.

C. FUTURE IMPROVEMENT OF THE AR-SSVEP PARADIGM
In the experimental paradigm designed by X. Chen et al, 40 targets flashing on PC screen could be accurately recognized by SSVEP-BCI [3], the stimuli are evenly distributed on the screen. We may not be able to adopt their stimulus layout directly in the AR glasses if we want to recognize so many target by AR-BCI. There are two challenges as follow: (1) the advantage of OST-AR is that it allows users to observe the real world without barriers, but displaying a large number of stimuli targets in AR will occlude the real environment, and reduce the true value of AR-BCI application; (2) The holographic imaging technique of the AR device causes that the SSVEP recognition accuracy is affected by the stimulus distribution, and lower accuracy may be obtained for the targets appearing at the edge of the holographic projector. Despite 40 targets are impossible for AR-SSVEP, it is still necessary to develop AR-SSVEP paradigm with more than 4 target for the AR-BCI application. We consider that the paradigm of AR-SSVEP could be improved from the following 2 directions: (1) introducing three-dimensional spatial stimulation design. Unlike the two-dimensional stimulation design on PC screen, users can observe the 3D display space in front of the field of view through OST-AR, and combine the spatial anchor technology of HoloLens to lock the virtual stimulus into the real physical world coordinates [44]. The number of stimuli are set appropriately in the different fields of view, and users can switch the fields of view through moving head. On the one hand, the number of targets is increased, on the other hand, it makes the human-computer interaction more interesting. (2) Introducing stereoscopic stimulation design. It has been reported that stereo vision can lead to a high degree of attention, and OST-AR can provide users with the depth perception of visual stimulus [45]. In detail, OST-AR has the advantage of 3D space presentation, and the shape of stimuli can be changed from plane to stereo, for example, using cube stimuli instead of square stimuli. Since this study focuses on comparing the SSVEP response difference between PC screen and AR stimuli, we did not investigate the relationships between these stimulation designs and AR-SSVEP recognition performance.

V. CONCLUSION
In this study, we evaluated the feasibility of transplanting SSVEP stimuli from PC screen to the holographic projection of AR glass. Comparison results showed that AR-SSVEP achieve similar classification accuracy with PC-SSVEP when the stimulus duration is more than 3s, but when the stimulus duration is less than 2s, the accuracy of AR-SSVEP is less than PC-SSVEP. For AR-SSVEP, reasonable stimulus layout is the key factor to obtain higher classification accuracy when the stimulus duration is less than 3s, and there is no significant difference on classification accuracy between layouts when the stimulus duration is getting longer. Although the results are ideal, there are still some interference factors need to be further considered in the application of the AR-BCI system, such as the effects of illumination and the contrast between the flickers and the real environment. In future, we will gradually explore the performance and value of AR-BCI in practical applications.