Attentional State Classification Using Amplitude and Phase Feature Extraction Method Based on Filter Bank and Riemannian Manifold

As a significant aspect of cognition, attention has been extensively studied and numerous measurements have been developed based on brain signal processing. Although existing attentional state classification methods have achieved good accuracy by extracting a variety of handcrafted features, spatial features have not been fully explored. This paper proposes an attentional state classification method based on Riemannian manifold to utilize spatial information. Based on the concept of Riemannian manifold of symmetric positive definite (SPD) matrix, the proposed method exploits the structure of covariance matrix to extract spatial features instead of using spatial filters. Specifically, Riemannian distances from intra-class Riemannian means are extracted as features for their robustness. To fully extend the potential of electroencephalograph (EEG) signal, both amplitude and phase information is utilized. In addition, to solve the variance of frequency bands, a filter bank is employed to process the signal of different frequency bands separately. Finally, features are fed into a support vector machine with a polynomial kernel to obtain classification results. The proposed attentional state classification using amplitude and phase feature extraction method based on filter bank and Riemannian manifold (AP-FBRM) method is evaluated on two open datasets including EEG data of 29 and 26 subjects. According to the experimental results, the optimal set of filter bank and the optimal technique to extract features containing both amplitude and phase information are determined. The proposed method respectively achieves accuracies of 88.06% and 80.00% and outperforms 8 baseline methods, which manifests that the proposed method creates an efficient way to recognize attentional state.


I. INTRODUCTION
C OGNITION is a high-level brain process, which includes multiple forms of knowing and awareness, such as conceiving, judging, imagining, attention, and problem solving [1].Among them, attention refers to the ability to position ourselves towards relevant stimuli and consequently respond to it, which underlies almost every task in daily life [2].It plays a vital role in education, public safety, medical care, and social production [3].Lack of attentional ability leads to diseases such as attention deficit hyperactivity disorder (ADHD) and other mental problems [4], [5].Given the importance of attention, it is of great significance to measure attentional states.Existing subjective methods, such as questionnaires and psychological tests help to measure the attentional state of a person [6], [7].However, these tools are sometimes unreliable for unclear memory and dishonesty of subjects.To figure out this problem, a variety of measurements based on physiological signals which are definite and trustworthy emerged to measure the attentional state [8].
As a harmless and convenient way to record brain activity, EEG has been widely used to measure attention and other cognition processes [9], [10].Existing studies have made contributions to revealing the connections between EEG signals and attentional states.For example, the reductions of alpha activity accompanied by increases in beta band fluctuations can indicate increases in attentional approach tendencies and vigilance [11], [12], [13].Other researchers provide evidence that an increase in spectral power of EEG slow-wave activity (theta band) is related to attenuated attentional control [14], [15].In addition, spatial selective visual attention and visual information processing are verified to modulate spectral gamma band power and increased gamma oscillations are concurrent with visual attentional perception [16].Other researchers distinguish different attentional states using event related potential (ERP) signal [17] since the absence of mind will lead to the absence of ERP signal while watching visual stimuli.
Based on the direct relationship between EEG signals and attentional states, a large amount of feature extraction methods have been excogitated to realize attention state classification.Spectral power of each channel is extracted as the feature to realize attention state estimation [18].Dynamical microstate of local and global duration is extracted as an effective feature for attention recognition [19].Li et al. applied correlation analysis to extract features related to anxiety level using a 5-point Likert scale [20].Various features including band ratios, cognition index, entropies, and functional connectivity metrics are selected by a couple of feature selection techniques [21].Different kinds of classifiers are employed to obtain better accuracy for the cognition classification such as support vector machine (SVM) [22], k-nearest neighbour (kNN) [23], and linear discriminant analysis (LDA) [24].
Although methods based on aforementioned handcrafted features have achieved good accuracy, spatial information, which matters for most cognition classification tasks such as motion imaginary (MI) based BCI [25], has not been fully utilized to realize attentional state classification.Riemannian geometry methods exploit the spatial covariance structure and generate smooth manifolds from intrinsically nonlinear data spaces [26], and thus utilize spatial information of EEG data.In [27], two Riemannian manifolds-based MI classification methods are proposed.The first method compares the Riemannian distance to mean of each class, while the second method projects the covariance matrices onto the tangent space and performs an LDA for classification.The latter method achieves an accuracy improvement of 5% compared to the common spatial pattern (CSP) method.Researchers in [28] combine Riemannian geometry method with sparse optimization to extract robust spatial features, which outperforms the CSP-based method with improvements of 9.9% and 12.4% on two datasets.These comparable performances of methods based on Riemannian manifold prove it an efficient way to extract spatial information.
However, most Riemannian methods consider only the amplitude of EEG signals in the problem formulation.The potential information embedded in the phase and the frequency domains of EEG signals is ignored.To solve this problem, a novel algorithm also based on Riemannian analysis is proposed in this paper to enhance attention state detection.Different from the conventional counterparts, the new algorithm constructs a series of independent Riemannian manifolds to extract information from the frequency and the phase domains.Specifically, a bank of filters is utilized to split the original broadband EEG signal into separate subband components, and subsequently for each subband two Riemannian manifolds are constructed to respectively account for the amplitude and the phase features.The proposed algorithm is thus named joint amplitude and phase feature extraction method based on filter bank and Riemannian manifold (AP-FBRM), and the main contributions of this paper can be concluded as follows.
• Firstly, an attentional state classification method based on Riemannian manifold is proposed, in which the covariance matrices are exploited for spatial information and Riemannian distances from intra-class Riemannian means are innovatively extracted as features for its robustness to noise.
• Secondly, to fully tap the potential of EEG signals, both the frequency and the phase domains are considered.
A filter bank is employed to characterize the different features that lie in different subbands.In addition, the Hilbert transform is utilized to compute the phase angle of original EEG signals, and three techniques are proposed to extract features containing both amplitude and phase information.
• Thirdly, the proposed method is validated and compared with multiple baselines on two open datasets which respectively comprises 29 and 26 subjects and distinguishes two different attentional states.The effectiveness of the proposed method is verified by its outperformance.The rest of the paper is organized as follows.The proposed attentional state classification method using amplitude and phase feature extraction based on filter bank and Riemannian manifold is described in Section II.In Section III, the dataset, the experimental setup used in this paper, and the experimental results are introduced.Next, discussions of the proposed work are given in Section IV and the conclusion is made in Section V finally.

A. Riemannian Geometry
Riemannian geometry-based methods have already been introduced to figure out BCI problems [29].It is denoted that } is the space of ndimensional real symmetric positive definite (SPD) matrices where S(n) is the space of n-dimensional real symmetric matrices.The eigenvalues of the SPD matrix are real and positive.SPD matrices with n dimensions can form a differentiable Riemannian manifold with a dimension of n(n + 1)/2.
1) Riemannian Distance: The Riemmanian distance is defined as the minimum length of all paths between two points on the Riemannian manifold.As mentioned, each point on the Riemannian manifold is an SPD matrix.Thus, the Riemannian distance of two SPD matrices P 1 and P 2 in P(n) is mathematically given by [27] δ R (P 1 , P 2 ) = ||logm(P −1 where λ i , i ∈ [1 : n] stands for the eigenvalues of the matrix P −1 1 P 2 and || * || F denotes the Frobenius norm.It should be mentioned that the operation symbol logm is the logarithm of a matrix.For an SPD matrix P, the logm(P) can be solved by the diagonalization of matrix P: logm(P) = V log( )V −1 where is a diagonal matrix of eigenvalues of P and V is a matrix of eigenvectors of P so that only logarithm operation for the diagonal elements need to be done.2) Tangent Space: Given a point P of a Riemannian manifold P , a tangent space T P is defined by tangent vectors at P. Any other point Q on P can be projected to the tangent space T P according to the logarithmic mapping K = Logm P (Q) [30] Logm Inversely, K can be projected back to the Riemannian manifold by the exponential mapping where logm and ex pm denote the logarithmic and exponential operations for a matrix.As shown in Fig. 1, the Riemannian path is projected as a Euclidean path, which reminds us that we can calculate the Riemannian mean by calculating the arithmetic mean in the tangent space and projecting it back to the Riemannian manifold using the Riemannian exponential mapping.The detailed process will be described in the following.
3) Riemmanian Mean: The Riemannian mean of N (N ≥ 1) given SPD matrices (P 1 , P 2 , . . ., P N ) is another significant definition that is closely associated with Riemannian distance, which is defined as Therefore, the Riemannian mean is also an SPD matrix that minimizes the sum of squares of Riemannian distances of a set of SPD matrices [31].While the Riemannian mean exists uniquely, no closed-form solution can be obtained thus it is necessary to resolve an optimization problem.To solve this equation, an iterative algorithm is employed [29], which is shown in Algorithm 1. 4) Feature Extraction on the Riemannian Manifold: The covariance matrices are symmetric positive definite so that they can be treated as points on the Riemannian manifold.One covariance matrix V ∈ R C×C can be computed by where X ∈ R C×T is a single trial of EEG data, where C denotes the number of electrodes channels and T is the number of time samples of one trial.Refresh the P (t+1) = E x pm P (t) (S) by projecting the S back to manifold.5: end while Considering that in Euclidean space, we straightly calculate the Euclidean distance to measure the difference between two points in this space.Similarly, the Riemannian distance is able to measure the difference between two covariance matrices, which are two points on the Riemannian manifold.In this paper, we assume that the features related to the attentional state exist in the Riemannian distance from the intra-class Riemannian mean and are separable directly.Conditioned on this assumption, we extract the Riemannian distances from intra-class Riemannian means as features for each covariance matrix of EEG data.Specifically, given trials data for the objective class { 1 , 2 } representing two different attentional states, we calculate two intra-class Riemannian means firstly where V d denotes the Riemannian mean of d-th class.
Considering that we focus on the binary attentional state classification problem, there are two intra-class means so that d = 1, 2. For the covariance matrix V * of an unlabeled EEG data, two Riemannian distances from the two means are taken as two features {t 1 , t 2 }:

B. Feature Extraction Using Amplitude and Phase Information
To develop the latent capacity of recorded EEG data, we extract the phase information as features for classification.Following the previous work [32], Hilbert transform is conducted to gain the phase information.For a multi-channel collected EEG signal X(t) = [x 1 (t), x 2 (t), . . ., x C (t)] T with a dimension of C × T , the Hilbert transform is realized by where * is the convolution operation.Therefore, the analytic signal is given by where j is the imaginary unit, A(t) is the amplitude and φ(t) is the phase angle, which can be demodulated by the Hilbert Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.transform [33]    Consequently, both amplitude and phase information are acquired by the transformation above.In our paper, we proposed three techniques to extract both amplitude and phase information, which are respectively to extract x(t) and φ(t), to extract x(t) and x(t), and to extract x(t) and (t) for both amplitude and phase information.In the subsequent text, we choose to use x(t) as a representation of either φ(t), x(t) or (t) to introduce our proposed method.Later we will compare the results of various techniques.

C. Joint Amplitude and Phase Feature Extraction Method Based on Filter Bank and Riemannian Manifold
In our study, we propose a joint amplitude and phase feature extraction method based on filter bank and Riemannian manifold (AP-FBRM), which is shown in Fig. 2. The detailed processes of the training stage and the test stage are shown in Algorithm 2 and Algorithm 3. As known, the core of the filter bank is to process the signal and to extract features of different frequency bands respectively in order to overcome the variance of frequency bands.Suppose the i-th trial of training data X i ∈ R C×T of N trails, we implement M filters to obtain the signal of various frequency bands Let X i, j be the i-th trial of training data filtered by the j-th bandpass filter, we extract the amplitude and phase information following equation (8) and we obtain following equation (7), with j = 1, 2, . . ., M; 5: Extract the distances from intra-class means as features where Xi, j is computed as mentioned in equation ( 9) and (10).Next, the covariance matrices of {X i, j , Xi, j } are calculated as instructed by equation (5) so we get Following the feature extraction method on the Riemannian manifold mentioned above, for j-th frequency band four intra-class Riemannian means of two objective classes { 1 , 2 } are acquired by equation ( 4) which are Output: The predicted label z X for X.
1: Implement a filter bank with M filters to obtain EEG signal in various frequency band Extract the amplitude and phase information [X j , X j ] for Feed the feature vectors v j into the trained classifier to obtain the predicted label z X for X.
Thus, for {V i, j , Ṽi, j } ∈ R C×C , Riemannian distances from the four means are constructed as features where with d = 1, 2 in consideration of the case of our binary attentional classification.Finally, for one unlabeled trial EEG data X i , the number of features in total is where M denotes the number of filters in the filter bank.

D. Support Vector Machine
Support vector machine (SVM) is one of the most popular classifiers in cognitive classification tasks [22].SVM is operated with a straightforward principle to seek a hyperplane as a decision boundary to maximize the distances between the positive and negative samples in the feature space [34].Due to the strong classification ability, in our paper, we take SVM as the classifier.
Given input data D = {d 1 , d 2 , . . ., d n } and a binary learning target y ∈ {−1, 1}, the features of input data constitute a feature space.If the decision boundary ω T d + b of the feature space exists with a normal vector ω and an intercept b, SVM is devoted to finding a hyperplane to maximize the margin between two classes.To solve the problem, SVM can be transformed into an optimization problem min where ξ is called the relaxation variable and c denotes a hyperparameter penalty coefficient.With the relaxation variable, the SVM finds a hyperplane that bears minor classification errors to get rid of overfitting to some extent.In this paper, a c-SVC is used, in which the penalty parameter c is set to 2. And a polynomial kernel is implemented, whose kernel function is given as k(u, v) = ( 1 2 u T v) 3 where u, v represent two feature vectors in the original space.All other settings remain default.

A. Dataset Description
An open-access dataset [35] provided by Shin et al. is used in our experiments.This dataset contains both EEG and near-infrared spectroscopy (NIRS) signals collected from twenty-nine healthy subjects, from which we only take EEG data for validation in our study.Hence, we simply introduced EEG dataset in the subsequent text.
During the data acquisition experiments, subjects are asked to take part in two tasks, which are respectively motor imagery and mental arithmetic.For our experiments, we only use signals from the mental arithmetic task to realize attentional state classification.The mental arithmetic task is composed of two sessions of a task session and a baseline session.In each session, subjects first take a 1-minute rest for preparation and then take 20 times tasks repeatedly with a resting period with a random length from 15 to 17 seconds.
In a single trial of the mental arithmetic task, an instruction is shown on the screen, which is a subtraction formula for example '923-9'.Subjects are required to remember the numbers within 2 seconds.Once the beep sounds, the instruction disappears and the subjects are expected to mentally compute the subtraction of the one-digit number from the previous result and repeat the process.For the baseline session, subjects need to take a rest without thinking.Subjects are asked to sit still during the recording to avoid motion artifacts.The task and baseline periods are both finished with a beep and a "STOP" instruction.

B. Experimental Setup
Given the source EEG data recorded from subjects, a common average re-reference is first conducted.Next, the re-referenced data is bandpass filtered to 0.5-50 Hz with a fourth-order of Chebyshev type II filter.
The data is downsampled to 200 Hz and we segment the data of the task stage into 2-second data epochs, which is shown in Fig. 3.For the EEG signal recorded in the mental arithmetic task, we labeled it as attentional data, and the data from the baseline session is labeled as inattentional data.Thirty channels data is used for validation.In order to Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.remove the electrooculogram (EOG) artifacts, an automatic toolbox [36], which is based on independent component analysis (ICA), is employed to remove the ocular artifacts.The experiments, including the pre-processing are conducted using MATLAB R2020b.All experimental results are obtained by a 10 × 10 cross-validation.

C. Effectiveness Validation of Feature Distribution
In the preceding part of the text, we assumed that the features related to the attentional state exist in the Riemannian distance from the intra-class Riemannian mean so we choose the Riemannian distance as feature.Now we are supposed to validate the effectiveness of the Riemannian distance first.To get an intuitive validation, we adopt a visualization tool t-distributed stochastic neighbor embedding (t-SNE) [37] to visualize the distribution of features.It is shown in Fig. 4 that the visualization result of the Riemannian distance features extracted from only time-domain signal x(t) within the frequency band 4-32 Hz without filter bank operation.In addition, to compare the effectiveness of our proposed AP-FBRM feature extraction method, the proposed features following Algorithm 3 are presented using t-SNE in the same figure.As shown in Fig. 4, the features of attentional and non-attentional states present different distributions, which preliminarily validates the effectiveness of the Riemannian distance as a feature extraction method.Moreover, to compare the visualization effect between AP-FBRM and the conventional counterpart which overlooks the frequency and phase-domain information, we calculate the normalized Laplacian scores [38] as an indicator.Combining the numerical results and visualization results, we may observe improvements for some of the subjects such as subject 8, 9, 24 and 27, while for most of the subjects, the visualization improvements are not obvious.Considering that what Fig. 4 shows is only the embedding of the original features which is subject to inevitable information loss, more rigorous validations are required and will be presented later.Noticeably, for subject 12 and subject 20, the purple and yellow dots almost mix together and cannot be separated easily, which is later revealed by the poor classification accuracies for the two subjects.

D. Effects of Filter Banks
To study the effects of the set of filter banks, experiments are conducted to compare the effectiveness of filter banks on our proposed AP-FBRM method.It should be noted that here we adopt x(t) and (t) to extract both amplitude and phase information.The filter bank we used is comprised of filters with same bandwidth and fixed frequency stepsizes.By varying the bandwidth and stepsize as shown in Table I, a series of different filter banks are produced and tested.The frequency range we consider in this study is 4-32 Hz.Therefore the number of filters is determined by the setting of the filter bank.In order to seek the best bandwidth for the filter bank, the bandwidth we choose is 4n Hz, where n = 1, 2, • • • , 5 for a fixed frequency stepsize 4 Hz.In addition, to explore the effect of stepsize of the filter bank, we choose stepsize 4m Hz, where m = 0.5, 1, 2, 3 for a fixed bandwidth 4 Hz.In the experiments, a second-order Butterworth filter is utilized to realize the filter banks.The results of various filter banks are displayed in Fig. 5.The numerical result of the average accuracies are represented in Table II.As we can see, with a fixed stepsize, the classification decreases with the increase of the bandwidth.The maximum accuracy 88.06% appears when n = 1.For a fixed bandwidth, the accuracy first increases and then decreases as the stepsize becomes bigger.The average accuracy reaches the highest value when m = 1.Therefore, we take n = 1 and m = 1 as the default experiment set in the rest of this paper.

E. Effects of Phase Information
It is also necessary to validate the effect of phase information in the proposed attentional classification method.We first compare the results under two situations which are respectively that we extract only amplitude information A(t), and only phase information φ(t).Moreover, as mentioned before, we compare different techniques to extract both amplitude and phase information to determine the optimal phase information extraction technique.Three proposed techniques are respectively to extract x(t) and φ(t), x(t) and x(t), and to extract x(t) and (t) which all contain both amplitude and phase information.To facilitate the following introduction, each of the above techniques is assigned a unique index as shown in Table III.The accuracies using various amplitude and phase information techniques are displayed in Fig. 6.It should be noted that the accuracy is projected to the color bar above.For each subject, the technique which achieves the best classification accuracy is highlighted with a red box.It can be observed that most of the red boxes are located in the dashed box, which is the area of the proposed techniques.Specifically, for 26 out of all the 29 subjects, the feature extraction technique achieving the best accuracy is one of three proposed techniques, which extract features containing both amplitude and phase information.Average classification accuracies are shown in Fig. 7.It can be observed that only extracting phase information achieves better results than the validated chance level 56.7%, which is defined in [39].It preliminarily proves Fig. 6.Classification accuracy of different techniques to extract amplitude and phase information for all the 29 subjects.Squares in red boxes are the technique achieving the best classification accuracy for each subject.Squares in dash line are the results of 3 proposed techniques to extract both information.The x-axis indicates the index of subjects, while the y-axis indicates the index of techniques, which are respectively to extract A(t), to extract φ(t), and other 3 techniques we proposed to extract both information: to extract x(t) and φ(t), to extract x(t) and x(t), and to extract x(t) and 8(t) according to equation ( 9) for an EEG signal.the effectiveness of phase information.The combination of amplitude and phase information reveals much more proficiency than only extracting amplitude information.Among the three proposed techniques, the best result is achieved by technique 5 which extracts x(t) and (t).The superiority of technique 5 is statistically verified (better than technique 3 with p < 0.001, and better than technique 4 with p < 0.1).

F. Comparison With AP-FBCSP
To further analyze the proposed AP-FBRM method, we compare the performance of the AP-FBRM and the filter bank common spatial filtering (FBCSP) method [40].The FBCSP method is a traditional method with a filter bank and spatial filtering for EEG classification.In this way, the proposed AP-FBRM method is kind of similar to the FBCSP method.Therefore, the FBCSP method is utilized as a baseline to validate the effectiveness of our Riemannian geometry-based method in the place of a spatial filter.It is unfair to compare the traditional FBCSP with the proposed AP-FBRM because the phase information is not used by the traditional FBCSP.Therefore, we improve it by extracting both amplitude and phase information.We call the improved FBCSP as AP-FBCSP.To ensure the fairness of the comparison, the set of the filter bank and the number of features in both methods keep the same.The classification accuracy of 29 subjects is shown in Table IV.Three proposed amplitude and phase information extraction techniques indexed 3, 4, 5 in Table .III are used to compare.As we can see, for 25 out of 29 subjects, the proposed AP-FBRM achieves the best result.And no matter what technique is used, the proposed method achieves higher average accuracy than the AP-FBCSP method ( p < 0.001).For some subjects like subject 17 and subject 21, the proposed Riemannian geometry-based method achieves significant improvements, which are 14.33% and 10.66%.Therefore, it is obvious that our Riemannian geometry-based method performs better than the CSP-based method, which validates the proposed Riemannian frame suits well in the attentional classification scene.For AP-FBRM, technique 5 turns out to be the best choice for 20 out of the 29 subjects and achieves the highest average detection accuracy of 88.06%.Comparatively, for AP-FBCSP, technique 3 achieves the best classification result of 82.97%.Therefore, the proposed AP-FBRM method achieves an improvement of 5.09% compared with AP-FBCSP.

G. Comparison With Existing Studies
In order to further analyze the performance of the proposed AP-FBRM method, we compare the results of the AP-FBRM with other existing methods.To exclude the dependence of the data and verify the generalization performance of the proposed method, the comparison is also done on another dataset [41], which contains 26 subjects.In this dataset, three cognitive tasks are contained, from which we only take EEG data recorded during the word generation (WG) task for validation.In the WG task, subjects are asked to continuously think of words beginning with the letter that was previously given, while in the baseline task, they are asked to sit still without thinking.Data during WG task is labeled as attentional data, while that during the baseline task is labeled as inattentional data.The EEG data is also segmented into 2-second epochs and 28 channels data is used for validation.A wide range of different methods are taken as baselines in this paper to make the comparison solid.Eight methods in total, which respectively represent typical spatial analysis, machine learning, Riemannian tangent space analysis, etc. are selected.A brief introduction of these eight methods is given as follows.
CSP [42]: Common Spatial Pattern (CSP) is an effective algorithm to construct optimal spatial filters for binary EEG classification.
FBCSP [40]: Filter Bank Common Spatial Pattern (FBCSP).The combination of the filter bank and CSP algorithm is proposed to address the problem of operational frequency band variance for EEG classification.HSS-ELM [43]: Inspired by the extreme learning machine (ELM), a hierarchical semi-supervised extreme learning machine (HSS-ELM) is proposed to extract high-level features using ELM.
CSP-shrinkageLDA [35]: A method combining CSP and shringkageLDA is utilized to classify EEG data in different states: mental arithmetic and rest stage, which achieve good results.
TSLDA [27]: Tangent space LDA (TSLDA) projects covariance matrices into Riemannian tangent space.A feature selection method is performed and an LDA is utilized for classification.
CCSP [44]: Correlation-based CSP (CCSP) utilizes time correlation between various classes of signals as the prior information and generates a regularization term, which outperforms the traditional CSP.
LR-TSTL [45]: Logistic regression with tangent space-based transfer learning (LR-TSTL) is proposed for motor imagery (MI)-based BCI classification problems.Tangent space features are extracted and then classified by logistic regression model.
AP-FBCSP: AP-FBCSP is an extension of FBCSP by integrating phase domain information, which is proposed in this paper.More details are described in the previous subsection.We choose the best classification accuracy of the AP-FBCSP method of various techniques as the comparison result.
Table V shows the classification accuracy using the proposed AP-FBRM method compared with existing methods, where dataset I is provided in [35] while dataset II is provided in [41].Compared with existing methods (CSP, FBCSP, HSS-ELM, CSP-shrinkage LDA, TSLDA, CCSP, LR-TSTL), the proposed AP-FBRM method achieves the best classification accuracies ( p < 0.001), which validates the effectiveness of our method.Moreover, the accuracy improvements of 5.09% and 5.15% compared with the AP-FBCSP method mean that our Riemannian geometry-based method performs better than CSP-based method, which reflects the significance of the framework on the basis of the Riemannian manifold.Furthermore, the application of filter bank and phase information not only explores the potential of EEG data but also increases Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
the number of features leading to better classification results, which can be observed by comparing the accuracies of CSP (77.23% and 68.90%), FBCSP (79.05% and 72.94%) and AP-FBCSP (82.97% and 74.85%) methods.

IV. DISCUSSIONS
The proposed AP-FBRM method takes Riemannian distances from intra-class Riemannian means as robust features, which substitutes for traditional spatial filtering such as CSP.No parameters need to be set in the Riemannian distance calculation.By combining filter bank and extraction of phase information, the proposed AP-FBRM method extracts robust features.Taking the phase information into consideration extends the usage of the EEG data.By adopting the filter bank, the problem of variance of different frequency bands is solved.The proposed method achieves good accuracy using a polynomial kernel SVM classifier.The obtained results appear satisfactory in comparison with existing methods.
We validate the effectiveness of the proposed method through experiments.Using the visualization tool t-SNE, Fig. 4 shows the feature distribution of the simple Riemannian distance and features extracted by the proposed method.The distribution validates the effective features lying in the Riemannian distance, which our method relies on.To further validate the effect of the filter bank, we design various filter banks and decide the optimal filter bank set (n = 1, m = 1) according to the experimental results.Fig. 5 shows that how the classification accuracy changes with the set of the filter bank.To validate the effect of phase information, we compare the classification results of only amplitude information, only phase information, and both information.It can be easily observed in Fig. 6 that both information achieved significant improvement.And the optimal technique to extract both information is to extract x(t) and (t) for the proposed method.
To further analyze the proposed Riemannian geometrybased method, we first extend the FBCSP method with the three proposed techniques extracting amplitude and phase information and we denote the new method AP-FBCSP.We compare it with our proposed method.Table IV presents the classification accuracies of each subject using three techniques.The proposed method achieved a 5.09% improvement over the AP-FBCSP method, which reflects the advantage of the Riemannian geometry-based method.
The comparison with existing methods is shown in Table V.The proposed method outperforms other methods.Moreover, the relationship of the classification accuracies of CSP, FBCSP, and AP-FBCSP is: CSP<FBCSP<AP-FBCSP.It also reflects the effectiveness of the filter bank and the proposed phase feature extraction techniques.
In this article, we extract Riemannian distance as the feature, which is a simple and effective feature [46].However, taking Riemannian distance as the feature is confronted with the problem of low dimensionality, and therefore many powerful classification algorithms cannot be implemented [47].To figure out this problem, tangent space features are extracted.Covariance matrices are mapped into the tangent space which is located at the Riemannian mean [48].The upper triangle elements of these matrices can be extracted as tangent space features with a dimensionality of n(n + 1)/2 [27], where n denotes the dimension of the covariance matrices.The features can then be fed into classifiers to realize classification.However, the high dimensionality of the tangent space features incurs another problem, which is a significantly high demand for training data [27].In this article, by constructing multiple Riemannian manifolds to extract spatial, frequency and phase information, we attempt to figure out the dimension problem by extracting Riemannian distances from multiple manifolds.In this way, powerful classifiers can be utilized, such as SVM used in this article.Finally, the proposed method is compared with TSLDA [27] and LR-TSTL [45] methods, which are based on tangent space features and achieved improvements of 9.47% and 8.04% on two datasets.The satisfactory results prove that the proposed method provides a feasible way to utilize the Riemannian distance as the feature.
Although the proposed method achieves satisfactory results, there are still some aspects that need to be pointed out.First, the proposed method takes Riemannian distances as features while the number of features corresponding to one Riemannian mean is one, which is fixed.It means that the number of features of Riemannian distance is less flexible compared with CSP.Therefore we solve this problem by combining filter bank and extracting phase information.However, it leads to the second problem.Calculating a Riemannian mean is a time-consuming process thus the proposed method bears more computational cost.Further study should focus on reducing the computation complexity, such as using some metrics approximating the Riemannian mean.

V. CONCLUSION
A novel AP-FBRM method was proposed in this study and we extended the application scenario to attentional state classification.Unlike existing methods, we utilized the spatial features of EEG data instead of the handcrafted features.The Riemannian geometry-based feature extraction method was employed to extract spatial information.Specifically, the proposed method extracted Riemannian distances from intra-class Riemannian means as features for its robustness to noise.In addition, to fully utilize EEG data, both amplitude and phase information were extracted through three proposed techniques.Furthermore, by combining the filter bank, the problem of variance of frequency bands was solved.Features extracted were fed into a polynomial kernel support vector machine to obtain the classification result.Experimental results on two open datasets validated the effectiveness of the proposed method.It achieved the accuracies of 88.06 ± 5.30% and 80.00 ± 6.69% on two datasets which outperformed existing methods.

Fig. 1 .
Fig. 1.Tangent space T P of the manifold P Ω at point P.

Algorithm 1 3 :
Computation of Riemannian Mean Input: N SPD matrices P 1 , P 2 , • • • , P N ∈ R n×n , the iterative threshold ϵ.Output: The Riemannian Mean P of N SPD matrices.1: Initialize P (1) = I ∈ R n×n 2: while ∥S∥ F > ϵ do Compute the mean S = 1 N N i=1 Logm P (t) (P i ) of N SPD matrices projected into the tangent space;4:

Fig. 2 .
Fig. 2. The framework of the proposed AP-FBRM attentional state classification method.

Algorithm 2
The Proposed AP-FBRM Attentional State Classification Method in the Training Stage Input: N trials of training EEG data X i ∈ R C×T , i = 1, 2, . . ., N belonging to two classes [ 1 , 2 ]; Input: corresponding class labels z ∈ R N .Output: Intra-class Riemannian means of the training data 2, . . .N and j = 1, 2, . . ., M; 6: Train the classifier with feature vestors v and labels z.

Algorithm 3
The Proposed AP-FBRM Attentional State Classification Method in the Test Stage Input: Single trial of unlabeled EEG data X ∈ R C×T ; Input: Intra-class Riemannian means of the training data

Fig. 3 .
Fig. 3. Schematic diagram of the mental arithmetic of the dataset and segmented epochs used for validation.

Fig. 4 .
Fig. 4. Visualization of feature distribution of 29 subjects.Green and red clusters denoted the Riemannian distance features extracted from only time-domain signal x(t) within the frequency band 4-32 Hz without filter bank operation, and purple and yellow clusters represent features extracted by the proposed AP-FBRM feature extraction method.

Fig. 5 .
Fig. 5. Classification accuracy of different sets of filter banks for all the 29 subjects.The x-axis indicates the set of n and m defined in TABLE I, while the y-axis indicates the classification accuracy.Gray thin lines denote the accuracies of each individual subject, and the red thick lines denote the average accuracies of 29 subjects.

Fig. 7 .
Fig. 7. Average classification accuracy of different techniques to extract both amplitude and phase information for all the 29 subjects.Bars with dark blue are three proposed techniques.

TABLE II AVERAGE
ACCURACY FOR DIFFERENT SET OF n AND m

TABLE V CLASSIFICATION
ACCURACY (IN PERCENTAGE) USING PROPOSED AP-FBRM METHOD AND EXISTING METHODS WHERE THE BEST RESULTS ARE MARKED IN BOLDFACE