Frequency Recognition of Short-Time SSVEP Signal Using CORRCA-Based Spatio-Spectral Feature Fusion Framework

Brain-computer interface (BCI) refers to the recognition of brain activity leading to generate corresponding commands to interact with external devices. Due to its safety and high time resolution, electroencephalogram (EEG) based BCIs have become popular. Steady-state visual evoked potential (SSVEP) is an EEG particularly attractive due to high signal to noise ratio (SNR) and robustness. A spatio-spectral feature fusion approach is introduced to recognize the frequency of short-time SSVEP using correlated component analysis (CORRCA). Two reference signals are generated by averaging each half of the training trials. The signal of each channel is passed through a filter bank designed to decompose into a predefined set of subbands. The spatial correlation coefficients are calculated between each subband of the test trial and the reference signals using CORRCA. The two sets of coefficients derived from two reference signals are merged and sorted in descending order. Thus obtained coefficients are weighted using a nonlinear function to define their contribution in frequency recognition. The weighted coefficients are fused to obtain a single coefficient for the target stimulus frequency of individual subband. The derived coefficients for each subband are weighted with another nonlinear function and fused to single coefficient for the target stimulus. A similar process is applied for each stimulus frequency and then the frequency corresponding to the highest coefficient is recognized as the target stimulus. The performance of the proposed method outperforms other existing algorithms to recognize the stimulus frequencies of SSVEP.

relative portability, low cost, and excellent temporal resolution [4], [5]. It has also received increasing attention from researchers in biomedical engineering, neuroscience, neural engineering, clinical rehabilitation, and so on due to its noninvasiveness. Several types of EEG signals are used as modalities for controlling a BCI system. It includes event-related potential (ERP) [6], [7], sensorimotor rhythm (SMR) [8], [9], steady-state visual evoked potential (SSVEP) [10], [11], hybrid BCI [12], [13] etc. Some researchers have begun to examine hybrid BCI approaches that use multiple modalities from brain signals [14], [15]. Traditional BCI systems encode a small number of commands which reduces their areas of applications. Recently, a high speed hybrid BCI system has been proposed that contains relatively increased number of commands [16]. The concurrent P300 and steady-state visual evoked potential (SSVEP) features are used to encode the commands and they are decoded by an ensemble task-related component analysis.
The SSVEP-based BCI has received increasing interest from researchers for its less training of the users, high signal to noise ratio (SNR), and a relatively high information transfer rate (ITR). Besides, reducing calibration time in SSVEP is a demanding issue in the related research community. The SSVEP is a periodic EEG response elicited by repetitive visual stimuli flickering at frequencies higher than 6 Hz [17], [18]. The subject focuses on a stimuli flickering at a specific frequency among a set of stimulus frequencies.
The SSVEP signal is then generated from the occipital lobe of the brain with the same frequency of the stimuli along with multiple harmonics. The goal of the SSVEP based BCI system is to detect the frequency of the focused stimuli from the captured SSVEP brain signals. A practical and effective frequency detection algorithm to recognize the frequency of SSVEP with high accuracy, high ITR with short time window plays an important role in the overall system performance of SSVEP based BCI implementation [4].
Various approaches have been developed to extract SSVEP features for frequency recognition. The most popular and widely used SSVEP frequency recognition algorithm based on the multivariate statistical algorithm is the canonical correlation analysis (CCA) [19], [20]. The standard CCA method introduced by Lin et al. [19] is a spatial filtering technique that uses artificial sinusoidal as reference signals. It was proposed for SSVEP frequency recognition which needs no calibration of data. The method is easy to implement without any complex optimization procedures. In CCA, the reference signals are made of preconstructed sine-cosine waves. But these preconstructed sine-cosine waves create a potential problem that causes the CCA method not to result in the optimal recognition accuracy due to their lack of features from the real EEG data. To address this problem, multiset canonical correlation analysis (MsetCCA) [21] has been proposed to optimize the reference signals used in the CCA method for SSVEP frequency recognition. The optimized reference signals are constructed by the combination of the common features and completely based on training data. An extension of MsetCCA has also been proposed to avoid the extraction of the possible noise components as common features by introducing the multilayer correlation maximization (MCM) [22] model. It has improved the recognition accuracy of SSVEP by combining the advantages of both CCA and MsetCCA using three layers of correlation maximization processes. The multivariate synchronization index (MSI) based algorithm can hardly exploit the full SSVEP-related harmonic components in the EEG, which limits the application of the MSI algorithm in BCI systems. To overcome this limitation a filter bank-driven MSI algorithm (FBMSI) [23] has been proposed for further improvement of SSVEP recognition accuracy.
Nowadays the researchers prefer correlated component analysis (CORRCA) over CCA for its high performance as the spatial filtering technique. It is also a multivariate statistical approach like CCA [24]. Besides, CORRCA is a spatial filtering technique that extracts the linear combinations of data which maximizes the Pearson product moment correlation coefficient between two multivariate signals. Compared to CCA which needs two different projection vectors, CORRCA assigns only one projection vector. Moreover, CORRCA relaxes the constraint of orthogonality which needs to be maintained in CCA. Furthermore, CCA doubles the number of free parameters by assigning two different projection vectors for two multichannel signals. It increases the computational cost unnecessarily. On the other hand, CORRCA simplifies the succeeding analysis and reduces the computational cost. So, the CORRCA could be an effective alternative for designing the frequency recognition method of SSVEP. The CORRCA method is more efficient in real-world applications for implementing the SSVEP based BCI system. Although the performance of CORRCA based algorithm is satisfactory as we have seen in the literature, there are some scopes to improve the performance of these systems by applying sophisticated signal processing technologies. One promising approach to improve the performance of the existing system is the filter bank analysis technique [20].
In this paper, a filter bank based method is implemented to recognize the frequency of SSVEP signals. The recorded multichannel SSVEP signal is passed through bandpass filters to decompose into multiple subbands. Two multivariate reference signals are generated by averaging each half of the training dataset. CORRCA based coefficients are extracted for each subband using individual reference signal. Thus obtained two sets of features for each subband are merged and sorted. The weighted sum of the coefficients are computed using a nonlinear weighting function. Thus obtained features from each subband are combined using a weighted combination method to create a more discriminative fused feature [25]. To improve the performance of frequency recognition method, the filter bank approach is used as a form of feature fusion to create more discriminative features. Finally, the stimulus corresponding to the maximum coefficient is determined as the recognized frequency of SSVEP. The overall performance has been enhanced by VOLUME 9, 2021 using the feature fusion approach. The standard CORRCA method yields multiple correlation coefficients to measure the correlation between two multivariate signals. Then, only the largest coefficient is selected as the feature and the rest of the coefficients are discarded [26]. The ignorance of these coefficients undoubtedly results in the loss of the discriminative information. Being motivated by the previous feature fusion strategy, we have proposed a spatiospectral feature fusion framework to create more robust and discriminative fused features for target frequency recognition in SSVEP based BCI.
To evaluate the performance of the proposed feature fusion framework, the popular frequency recognition method COR-RCA is used as a representative spatial filtering technique for the frequency recognition of SSVEP. Furthermore, we have conducted the experiments on two publicly available standard SSVEP datasets. One is comprised of 10 subjects with 12 stimuli and the other contain 35 subjects with 40 stimuli. The performances in terms of accuracy and ITR of the proposed method are compared with the existing algorithms.
The remaining parts of this paper are organized as follows. Section II describes the SSVEP datasets used in the experiments, Section III describes the methodology, Section IV illustrates the experimental results, Section V presents a discussion on the performance evaluation and finally, Section VI includes the concluding remarks.

II. DATA DESCRIPTION
Two publicly available SSVEP datasets termed Dataset I and Dataset II hereafter are adopted to evaluate the performance of the proposed method. The datasets are described below. The Dataset I is prepared by Swartz Center for Computational Neuroscience, Institute for Neural Computation, University of California San Diego, La Jolla, California, USA [27]. The data are obtained from an offline SSVEP based BCI experiment. Here, 10 healthy subjects are volunteered. Among them, 9 are male and 1 is female. Their mean age is 28 years. Five of the subjects had experience with SSVEP-based BCI experiments and the rest were naive. Before participating in the experiment, all the participants signed a written informed consent paper. A 12-class SSVEP dataset was recorded using a Biosemi ActiveTwo EEG system (Biosemi, Inc.) with 8 Ag/AgCl electrodes to cover the occipital area in a simulated online BCI experiment. The 12 stimuli are designed using a joint frequency and phase coding method. The frequencies are ranging from 9.25 Hz to 14.75 Hz with an interval of 0.5 Hz including the phases 0, 0.5π, π, and 1.5π. The sampling rate of the recorded EEG signals is 2048 Hz. All the subjects are asked to look at 12 flashing stimuli arranged in a 4 × 3 grid of 6 cm × 6 cm squares that represents a numeric keypad. The EEG recording consists of 15 trials. In each trial, the subject is instructed to focus on one of the stimuli indicated by a random order for 4 s and go through all 12 targets. The recorded signals are then downsampled to 256 Hz. The latency of 135 ms in the visual pathway is considered [28]. In the offline analysis, time 0 indicates the stimulus onset and the 4.00 s is the data length.

B. DATASET II (35 SUBJECTS, 40 STIMULI)
The Dataset II is prepared by the Tsinghua group [29]. It is publicly available and called Benchmark SSVEP Dataset. The data are recorded from an SSVEP-based BCI experiment using a cue-guided target selecting task. Here, 35 healthy subjects have participated. Among them, 18 are male and 17 are female. Their mean age is 22 years. Only 8 of the subjects among them had experience with SSVEP-based BCI experiments. The 40-class SSVEP dataset is recorded using a Synamps2 system (Neuroscan, Inc.). A total of 64 electrodes are placed according to the 64-channel extended international 10-20 standard to capture the EEG signal. The EEG signals from 9 channels (Pz, PO5, PO3, POz, PO4, PO6, O1, Oz, and O2) covering the occipital lobe of the brain are selected for offline analysis. The ground electrode (GND) is positioned in middle between Fz and FPz electrodes. The reference electrode is placed on the vertex (Cz). The frequencies of the 40 stimuli are ranging from 8 Hz to 15.8 Hz with an interval of 0.2 Hz. The EEG signals are recorded at a sampling rate of 1000 Hz and downsampled to 250 Hz. During the experiments, each subject is asked to spell for 6 trials. During the recording of EEG in each trial, the subject is asked to gaze at 40 visual stimuli corresponding to 40 stimulus frequencies in random order. Each trial lasts for 6 s, including 0.5 s for the visual cue and 0.5 s for the inter-stimulus interval. The latency of 140 ms in the visual pathway is considered. In the offline analysis, prestimulus onset is 0.5 s, poststimulus onset is 5.50 s, and the data length is 5.00 s.

III. METHODOLOGY A. CORRELATED COMPONENT ANALYSIS
The correlated component analysis (CORRCA) is a technique to maximize the Pearson product moment correlation coefficient between two multi-dimensional signals [26]. Compared to CCA, CORRCA produces the same projection vectors for two sets of multivariate signals such that the linear combination of two data is maximally correlated [30]. The projection vectors are needed to be orthogonal in CCA. The CORRCA relaxes this constraint. Unlike CCA, CORRCA assigns a single projection vector for the two sets of multivariate signals instead of two different projection vectors. Mathematically, CORRCA is an optimization problem and its projection vectors can be found by solving a generalized eigenvalue problem.
Considering that X ∈ R Nc×Ns and Y ∈ R Nc×Ns are two multivariate signals, where Nc is the number of channels and Ns is the number of sample points. The CORRCA is an optimization problem which tries to find a projection vector w ∈ R Nc×1 such that the resulting linear combination x = w T X and y = w T Y show maximum correlation.
The correlation coefficient (ρ) will be, where the covariance matrices are Ns XY T and R 21 = 1 Ns YX T . Now differentiating Eq. (1) with respect to w and setting to zero, we get the following eigenvalue problem (considering that w T R 11 w = w T R 22 w), The principal eigenvector of (R11 + R22) −1 (R12 + R21) corresponds to the maximum coefficient of ρ. It maximizes the correlation coefficient between x and y. Moreover, the second strongest eigenvalue corresponds to the second strongest correlation coefficient. This coefficient is obtained by projecting the data matrices onto the eigenvector. Similarly, the remaining coefficients are derived. In this study, subband CORRCA based spatial filtering technique is employed to recognize the frequency of SSVEP. Using Eq. (2), Nc coefficients ρ = [ρ 1 , ρ 2 , . . . , ρ Nc ] are obtained. Among the coefficients, only the maximal coefficient is used as the feature for frequency recognition. To recognize the frequency of a test signalX ∈ R Nc×Ns , we need to calculate these coefficients with an individual template signal generated by taking the mean of SSVEP across multiple trials at frequency i. The frequency of the template signal with the maximum correlation coefficient is nominated as the frequency of the test signal, Recently, the filter bank method becomes popular for the development of modern BCI systems [31]. It has been found that the filter bank technology could improve the performance of different spatial filtering algorithms. For example, the common spatial pattern (CSP) algorithm [32], [33], CCA [20], [31]. Here, filter bank is applied on the CORRCA and we have termed it as FBCORRCA. Five subbands (Sn = 5) are used here. The lower cut-off frequency of the i-th subband is set to i × starting stimulus frequency (i = 1, 2, . . . , Sn) of the dataset and the upper cut off frequency is set to 80 Hz. The zero-phase Chebyshev Type I infinite impulse response (IIR) is used to extract each subband signal. Then the features from the subband signals are combined using a procedure similar to that in [34] to recognize the frequencies of the SSVEPs.

C. PROPOSED METHOD
In this study, CORRCA based spatio-spectral feature fusion framework has been used to generate more discriminative features to recognize the frequencies of the SSVEPs. The implementation of the proposed framework can be segmented into three major stages -(1) Subband decomposition, (2) Feature fusion, (3) Frequency recognition. The details of each stage is described in the following subsections.

1) SUBBAND DECOMPOSITION
The SSVEP signals of a particular subject are decomposed to a finite set of narrowband signals using bandpass filters. The four dimensional data is represented by Nc × Ns × Nt × Nf , where, Nc represents the number of channels, Ns represents the number of sample points, Nt represents the number of trials, and Nf represents the number of stimuli. The dimension of Dataset I is 8 × 1024 × 15 × 12 and Dataset II is 8 × 1250 × 6 × 40. The narrowband signals enhance the performance of SSVEP frequency recognition [20]. The signals of all channels and all trials are filtered using a bank of bandpass filters for each stimulus frequency. A Chebyshev Type I filter of order = 12 and stopband attenuation, r = 3 dB is used here for subband decomposition. The Sn bandpass filters are used to generate Sn frequency bands of the original signal. Each bandpass filter covers multiple harmonics with the same upper cutoff frequency of 80 Hz. The lower cutoff frequency of m th bandpass filter is m × f 0 , where, f 0 represents the starting stimulus frequency and m = 1, 2, . . . , Sn. The five (Sn = 5) different frequency bands of time domain SSVEP signals are shown in Fig. 1. The signals are obtained from the first channel of the first stimuli with flickering frequency 9.25 Hz for subject 's1' of Dataset I. The test signal is represented by X i ∈ R Nc × Ns and the template signals are represented by Y 1 i and Y 2 i ∈ R Nc × Ns of i th frequency with i = 1, 2, . . . , Nf . The template signals are generated from the training trials in the following way, Y 1 i = Mean of the first half of training trials for i th stimulus frequency. Y 2 i = Mean of the last half of training trials for i th stimulus frequency.  The subbands for the test signal are represented by X 1 , X 2 , . . . , X Sn and that of the template signals are The spatial features obtained by CORRCA in each subband are fused. In standard CORRCA, the individual coefficient provides discriminative information to recognize the frequencies of different stimuli. In Fig. 2 (a), the mean classification accuracy for each of the individual correlation coefficients for four different time windows is presented. The lengths of the time windows are 0.25 s, 0.50 s, 0.75 s, and 1.00 s. It is observed that the mean accuracy is decreasing nonlinearly. If we combine the coefficients using a nonlinear weighting function Phi (φ) as shown in Fig. 2 (b), the overall performance of the system is supposed to be increased. It leads to the spatial feature fusion strategy in this study. Therefore, we fuse these features or coefficients with the weights ∅ 1 , ∅ 2 , ∅ 3 , . . . , ∅ 2Nc using a nonlinear weighting function ∅ k = e (−a 1 .k) + b 1 , where, k = 1, 2, . . . , 2Nc and function parameters are a 1 = 0.6, b 1 = 0.2 for both of the datasets (Dataset I and Dataset II). The values of the parameters are optimized using grid search. The nonlinear weighting function is used to fuse the 2Nc correlation coefficients. Each coefficient obtained by CORRCA represents different channels particularly different spatial locations of the occipital region in human brain. The spatially fused features are generated by fusing these coefficients and the spatio-fused feature for m th frequency band is, where, k is the index of the coefficients and k = 1, 2, . . . , 2Nc. The index of subbands is represented by m = 1, 2, . . . , Sn.
The spatially fused features of all frequency bands are fused using nonlinear weighted sum. The coefficient of each frequency band provides discriminative information to recognize the stimulus frequencies. The mean classification accuracy obtained by the feature of each frequency band for four different time windows is presented in Fig. 3 (a). The lengths of the time windows are 0.25 s, 0.50 s, 0.75 s and 1.00 s. It is observed that the mean accuracy is nonlinearly decreasing with the increase of the index of the frequency band, whereas, each of the features has a contribution to frequency recognition. Hence, it is obvious that we can achieve higher recognition accuracy by combining all the coefficients. The intended nonlinear weighting function Omega (w) to fuse the coefficients of all frequency bands is illustrated in Fig. 3 (b). It leads to the spectral feature fusion strategy used in this study.
The feature for the test signal X , the pair of template signal Y 1 i and Y 2 i of the i th stimulus frequency is prepared to perform spectral feature fusion. The spectral feature fusion is obtained as follows,

3) FREQUENCY RECOGNITION
The final stage is the recognition of stimulus frequency of test SSVEP. The final coefficients obtained for all stimulus frequencies are defined as ψ 1 . . . ψ i . . . ψ Nf . Finally, the frequency of the template signal with maximum value of ψ i is selected as the frequency, f x of the test signal X as follows The block diagram of the proposed frequency recognition method is illustrated in Fig. 4.

D. PERFORMANCE EVALUATION
The classification accuracy and information transfer rate (ITR) are used here as the evaluation metrics to evaluate the performance of the proposed method [1]. An extensive comparison of the proposed method with the traditional spatial filtering method is carried out in this study. Four different data lengths (0.25 s, 0.50 s, 0.75 s, and 1.00 s) are considered to calculate accuracy and ITR for evaluating the BCI performance. A leave-one-out cross-validation (LOOCV) technique has been employed to measure the accuracy. The LOOCV is a variant of K-fold cross-validation where the number of folds equals the number of instances or observations in the data set, k = n. It uses each instance as a separate test set once and all the remaining instances are considered as training set. Thus, for n instances, there are n different training sets and n different test sets. We have used two different SSVEP datasets (Dataset I and Dataset II) to evaluate the performance of the proposed method. To perform the LOOCV, (Nt − 1) trials are used for training and the remaining one trial is for testing for each of the datasets. The number of iterations is equal to the total number of instances or trials (Nt). Finally, the accuracy is calculated by taking the average of individual iteration accuracy using the following formula, Here, the accuracy of an individual subject is shown. The subject represents the index of the subjects of each dataset. Finally, the accuracies of each subject are averaged to obtain the mean accuracy of the particular dataset.
In addition to classification accuracy, the BCI performance of the proposed method is also evaluated by ITR. It is a standard measure that calculates the amount of information communicated per unit of time of a communication system. The ITR is calculated using the following formula, Here, P is the normalized classification accuracy, Nf is the total number of stimuli or commands, T is the average time for a selection (seconds/selection) of a specific time window. The gazing time G is 1.00 s and 0.50 s for Dataset I and Dataset II respectively [28], [35]. The feature values have been evaluated across all subjects for different stimulation frequencies to evaluate the performance of different methods. In this study, the average computational time required for single-trial target detection of each method has been estimated. The total time required for feature extraction and classification is considered as the computational time.

IV. EXPERIMENTAL RESULTS
The average classification accuracy and ITR over the subjects with different time windows ranging from 0.25 s to 1.00 s with a step size of 0.25 s are shown in Fig. 5. The subplots (a) and (b) of Fig. 5   In all cases, the p-value is less than 0.05 and hence the proposed method is significantly improved in performance measured by classification accuracy as well as ITR.
The experimental results illustrate that the proposed framework is feasible to enhance the performance of the standard CORRCA. Moreover, the standard deviation of classification accuracies using the proposed method is decreased significantly compared to the standard CORRCA. It indicates that the proposed method improves the reliability of frequency detection for all stimulus frequencies.
The results demonstrate that the proposed framework is able to produce robust features compared to the existing methods.
In the proposed framework, subband decomposition is used to enhance the performance of SSVEP frequency recognition accuracy by including multiple harmonics of the stimulus frequency. The average recognition accuracies of SSVEP frequency of the proposed method across all subjects for Dataset I and Dataset II are illustrated in Fig. 6 as a function of the number of subbands (Sn). It is studied to define the optimal number of subbands. The highest accuracies are achieved with Sn = 2 as well as Sn = 3 for Dataset I. The minimum error is incurred with Sn = 3 for most of the time windows. A similar scenario is observed in the results with Dataset II. Hence 3 subbands are used for producing the final results with Dataset I as well as Dataset II. All of the coefficients obtained from two template signals are used in feature fusion. The classification accuracy and ITR of TABLE 1. Performance comparison of the proposed method with other existing methods for the stimulus frequency recognition of SSVEPs. The performance is evaluated in terms of accuracies (%) and ITRs (bits min −1 ) of Dataset I and Dataset II for 1.00 s data length. the proposed method are improved compared to the standard spatial filtering (CORRCA) method.
The performance of the proposed method is compared with several recently developed algorithms for SSVEP frequency recognition in terms of recognition accuracy and ITR using Dataset I and Dataset II. The performance of the proposed CORRCA based feature fusion method is compared with the recently developed methods as presented in Table 1. The comparative study illustrates that it outperforms the state-ofthe-art methods in terms of frequency recognition accuracy as well as ITR for both the Datasets.

V. DISCUSSION
The electroencephalogram (EEG) is the most studied modality for BCI implementation. Recently, the SSVEP is gaining popularity because of its high SNR, less user training, and less computational cost among various EEG signals in the area of BCI research. The SSVEP signal is less susceptible to be contaminated by noise and external artifacts. Generally, the discriminative information of the projected signals or features is lost because of the corruption by these components [19], [25]. The main target of the researchers is to develop highly efficient and low computational cost algorithms for classifying SSVEP signals. A variety of algorithms have been proposed that utilize different types of BCI modalities [42]. The CCA is broadly used as one of the most popular methods used for frequency recognition of SSVEP signals in BCI research community. Due to the requirement of longer calibration time, the performance of CCA based method is not satisfactory. Different CCA based approach has been proposed to enhance the performance of CCA based methods. For example, a hybrid subject correlation analysis (HSCA) was proposed in [36] to improve the performance of SSVEP based BCI. It is an advanced CCA-based algorithm. A set of artificially generated sinusoids are used as reference signals in the traditional CCA-based method. The HSCA method combines the training data of the target subject and other subjects at the same time which helps to solve the drawbacks of CCA-based methods [36]. The temporally local structure of samples is not considered properly in CCA. In temporally local canonical correlation analysis (TCCA) [37], the original covariance matrix is replaced by the temporally local covariance matrix. Furthermore, they have applied filter bank on TCCA, named filter bank TCCA (FBTCCA) [37] which achieves higher performance for SSVEPs recognition. The CCA-M3 [39] method derives the spatial filters using training data. The observed EEG training data and their SSVEP components are used as the two inputs of CCA to obtain the spatial filters. The objective function is optimized by averaging multiple training trials. Another method named exactly periodic subspace decomposition (EPSD) [41], utilizes the periodic properties of the SSVEP components to achieve a robust spatial filter for frequency recognition. The SSVEP components are extracted by projecting the EEG data onto a subspace of target signal components. The convolutional neural network (CNN) is employed with user-dependent (UD) and complex spectrum features for the detection of SSVEP [38]. It is named as UD-C-CNN which has user-independent (UI) and user-dependent (UD) training scenarios. It is found that the UD-based training methods consistently outperform UI. A multivariate synchronization index (MSI) is another efficient method for recognizing the frequency of SSVEP. The inter-subject and intra-subject template signals based MSI (IIST-MSI) [40] improves the performance of the standard MSI approach. It also uses the dynamic window to extract the temporal features of SSVEPs.
Recently, CORRCA based method has achieved better performance than other traditional spatial filtering techniques for classifying SSVEP signals [34]. The CORRCA needs to compute one projection vector, whereas, two mutually orthogonal projection vectors are required for CCA. It relaxes the restrictions of orthogonality. Moreover, it reduces the computational cost compared to CCA. In this study, discriminative features are generated by fusing features at multiple levels. The fusion of features extracted from multiple subbands using CORRCA is employed here to enhance the frequency recognition accuracy of SSVEP signals. The performance in terms of frequency recognition accuracy and ITR of the proposed approach exhibits its dominance over the other comparable algorithms. The COR-RCA based spatial filter produces correlation coefficients equal to the number of channels. These coefficients are generated by solving a generalized eigenvalue problem. The standard CORRCA exploits only the largest coefficients corresponding to the largest eigenvalue as the feature for SSVEPs frequency recognition. The remaining coefficients are discarded, whereas, all the coefficients are considered in the proposed method. The inclusion of the weighted version of all the coefficients enhances the performance of frequency recognition.
The accuracies of individual coefficients using different template signals (combined template signals Y 1 i and Y 2 i , first template signal Y 1 i , second template signal Y 2 i , single template signal Y i ) for i th stimulus frequency are illustrated in Fig. 7 for Dataset I. It is observed that each of the coefficients provides different information to recognize frequency. The use of combined template signal achieves higher performance. Hence, the fusion of features of the correlation coefficients generated by the CORRCA spatial filter obviously enhances overall performance. In the proposed method, the spatio-fused features are generated by fusing the coefficients obtained by CORRCA on each frequency band. The subband approach is widely used to produce more discriminative features in SSVEP as well as motor imagery based BCI [20], [42]. It is observed in Fig. 6 that each frequency band provides discriminative information. Therefore, the features obtained by spatiofused of all frequency bands are fused to generate the spatio-spectral features. The performance of CORRCA based proposed method is enhanced compared to standard CORRCA using the fused features obtained from multiple reference signals and multiple frequency bands. The ITR is an important metric for measuring the performance of a BCI system. For different time windows, the ITR is also improved compared to the standard CORRCA for both of the datasets. The effectiveness of the proposed framework to be used practical environment can be verified by its computational cost. The computational cost is assessed by the time taken to recognize the frequency of single trial SSVEP. The performance of the proposed method has been evaluated using MATLABR2018a on a computer with a processor Intel (R) Core (TM) i5-7200U CPU @ 2.50 GHz CPU and 8.00 GB RAM for different time windows. Although the computational time for the proposed approach is longer than the standard CORRCA method, this proposed approach can be executed efficiently within a short period (less than 0.10 s) for both datasets (Dataset I and Dataset II). It is fair enough to meet the online application requirements for SSVEP-based BCIs.
A comparison of computational cost of the proposed method is illustrated in Fig. 8 for various time windows using Dataset I and Dataset II. Considering the promising performance, it is obvious that this method is convenient for the real-time application of SSVEP based BCI systems. One of the limitations of SSVEP based BCI is that the attention to the low frequency visual stimuli increases the fatigue of subjects [43], [44]. The target of the proposed method is to decrease the calibration time such that less time is required to look at the stimuli and hence fatigue of the subject is being reduced.
A spatial and spectral feature fusion framework is proposed for recognizing the SSVEPs. The key contribution of this work is that CORRCA spatial filtering-based spatio-spectral feature fusion framework is introduced in which two template signals are used. There are some limitations of the proposed method. The optimum values of two parameters a and b used in the non-linear weighting functions for both spatial and spectral feature fusions are determined by the grid search approach. It is a bit computationally expensive. We have also optimized the parameter Sn which represents the number of frequency bands used in spectral feature fusion. The parameters can be estimated by using machine learning approach. The performance of the proposed method is studied in offline using two publicly available SSVEP datasets. In the future, it is planned to conduct the online BCI experiments to evaluate its performance. The main goal of SSVEP based BCI technology is to assist physically disabled persons or paralyzed patients who are not able to talk and/or unable for muscle movement. The development of SSVEP based alphanumeric keyboard is considered for future extension of this work.

VI. CONCLUSION
In this study, a multistage feature fusion framework is proposed to recognize the stimulus frequencies of SSVEP signals. The CORRCA based spatial filtering approach is used to extract the features of SSVEP signals collected from EEG sensors spatially distributed on the scalp. The reference signal together with the SSVEP is used in CORRCA to extract the features. The training trials are used to generate the reference signals instead of artificially generated sinusoidal. The features derived by using CORRCA are manipulated to create more discriminative features by fusing them in multiple steps namely spatial feature fusion and spectral feature fusion. To fuse the features, the proposed framework has utilized nonlinear weighting functions in spatial and spectral fusion levels. As a result, more robust and discriminative features are obtained for frequency recognition. The experimental evaluation is performed on two publicly available standard SSVEP datasets denoted here by Dataset I and Dataset II. The first dataset contains SSVEP data of twelve target stimuli from ten subjects and the second dataset contains SSVEP data of fourty target stimuli from thirty-five subjects. The experimental results show that the proposed framework outperforms the state-ofthe-art SSVEP frequency recognition methods. A quantitative comparison is performed among the standard CORRCA, FBCORRCA, and proposed multistage feature fusion based target identification method for SSVEP based BCIs. The proposed method significantly outperforms the standard CORRCA and FBCORRCA method. This method also outperforms other traditional frequency recognition methods of SSVEP. It can be a promising choice to achieve satisfactory performance for the frequency recognition of SSVEP based BCI applications. The study on the performance of the proposed method for different datasets leading to implement an effective and reliable frequency recognition system for SSVEP based BCI applications is considered as future extension of this work.