Subject-Dependent Emotion Recognition System Based on Multidimensional Electroencephalographic Signals: A Riemannian Geometry Approach

Emotion recognition plays an important role in human computer interaction systems as it helps the computer in understanding human behavior and their decision making process. Using Electroencephalographic (EEG) signals in emotion recognition offers a direct assessment on the inner state of human mind. This study aims to build a subject dependent emotion recognition system that differentiate between high and low levels of valance and arousal, using multidimensional EEG signals. Our system offers a transfer learning- minimum distance to Riemannian mean (TL-MDRM) framework. In this work, we perform two pre-processing stages. In the first stage, we analyze the EEG signals to investigate their non-Gaussianity and determine the most appropriate signal distribution. Using several statistical and goodness of fit tests, T-distribution was found to be the most appropriate distribution. Covariance matrix estimations plays a crucial step in manifold learning technique, based on the most suitable signal distribution the covariance matrix estimation technique is chosen. In the second stage, we perform transfer learning to deal with cross-session variability by generating a unique reference point for each participant and performing affine transformation for the covariance matrices on the symmetric positive definite (SPD) manifold around this point. The results show that, TL process improved the performance even when assuming Gaussian distribution, while assuming T-distribution with TL improved the performance further.


I. INTRODUCTION
In the last few decades there has been a considerably growing attention towards human computer interaction (HCI) systems, but most of those systems are still not efficient in understanding human emotions. The ability to classify human emotional responses to different stimuli opens the door for new innovations in HCI.
Recording EEG signals requires the placement of multiple electrodes at certain locations on the scalp. Due to the current advance in technology, EEG signals capturing devices became wearable, portable, easy to use, and even wireless this makes the use of EEG signals very attractive as it is noninvasive, fast and inexpensive. There exist wide area of applications for the use of EEG-based emotion recognition systems such as, e-learning [11], e-health care [12], entertainment and gaming [13].
EEG signals based emotion recognition systems extracts features from time domain, frequency domain, or joint time VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ and frequency domains. In time domain techniques, statistical features such as signal power, energy, entropy, . . . , etc. have been excessively used [14]. In frequency domain, features are extracted from different frequency bands and added together to form the feature vectors used for classification [5], [15]. New techniques [16] combining both time and frequency domain features were introduced. Other EEG based emotion recognition systems use raw EEG signals for extracting useful spatial and temporal information's from different channels and different time samples [17]. The use of Riemannian geometry in brain computer interface (BCI) and in studding brain disorders is attracting attention due to its simplicity, robustness, and accuracy. In [18] Fruehwirt et al. used Riemannian tangent space mapping in studying Alzheimer's disease. Yuan et al. in [19] performed Epileptic seizure detection in the space of the symmetric positive definite (SPD) matrices using Log-Euclidean Gaussian Kernel-Based sparse. In [20] Congedo et al. offered a complete review on the use of Riemannian geometry for EEGbased brain computer interface. In our previous work [6] on emotion recognition using EEG signals we used Minimum Distance to Riemannian Mean (MDRM) classifier, for classifying four classes of emotions. Different frequencybands, channel combinations, and geometric mean generation techniques were examined.
Data in real-world scenarios tends to be corrupted with outliers and/or exhibit heavy tails. In such cases using sample covariance matrices (assuming Gaussian distribution) offers biased results, as it tends to ignore outliers and heavy tail of the data. In [21] J. Charles et al. tested EEG signal distribution and found that Laplace distribution proved to be more robust estimator than Gaussian Distribution, and in [22] they also used Laplace distribution for better statistical reduction of multi-channel EEG data. N. Nazmia et al. [23] used goodness to fit tests to find the most appropriate EEG signal distribution, they found that the Generalized Extreme Value distribution is the most appropriate distribution for describing the EEG and the Electromyography (EMG) signals.
EEG data recorded on different sessions and/or from different participants tends to have statistical variability. This forms a challenge that faces brain computer interface systems that tries to reuse data from previous sessions/subjects. Transfer learning (TL) is an approach used to overcome this variability. Their are several studies that perform transfer learning on Riemannian geometry framework. In [24] Zanini et al. performed cross-session and cross-subject transfer learning, their TL approach is based on modifying MDRM classifier they called their new framework Riemannian alignment (RA)-MDRM. Zanini et al. dealt with cross-session/subject variability as a geometric transformation (shift) of covariance matrices on the Riemannian manifold with respect to a reference state, and when the brain is performing a specific task, covariance matrices are shifted in the same direction over the SPD manifold. RA-MDRM technique showed improvement over normal MDRM classifier over both motor imagery and event-related potentials datasets. He et al. [25] aligned the EEG trials in the Euclidean space which makes them more similar and enhances the learning performance for new participants, feature extraction and classification processes are performed on the aligned data. Working in the Euclidean space gave them the advantage of faster computation and also the ability of using various classifiers. In [26] P. Rodrigues et al. matched the statistical distribution of two datasets using three steps geometrical transformations (translation, scaling, and rotation) in order to make the shape of the statistical distribution of the data as similar as possible. In [27] Lin et al. proposes a machinelearning strategy called robust principal component analysis (RPCA)-embedded transfer learning (TL) frame work aims to generate a personalized cross-day emotion-classification model with less labeled data, while avoiding intra and interindividual difference. They used the Riemannian distance to measure the between-session similarity and thereby pair most similar auxiliary source sessions to a target session for TL. Yair et al. [28] proposed an unsupervised approach for domain adaptation using parallel transportation on the cone manifold of SPD matrices. In [29] Wang et al. proposed a domain adaptation SPD matrix network (daSPDnet) to solve subject independent emotion recognition problem. They combined prototype learning with the Riemannian metric and design a new prototype loss, which aims to calculate the geometric mean of the SPD matrix set in the low-dimensional representation layer. Their daSPDnet can extract an intrinsic emotional representation shared between different subject.
The objective of our work is to offer a TL-MDRM framework to classify human emotions and study the effect of using the most appropriate covariance matrix estimation technique (based on determining the closest EEG signal distribution) on the system performance.A two step process was performed. In the first step, we analyse the EEG signals using multiple statistical tests to prove that the signals are heavy tailed, then using two goodness of fit tests we compared between Gaussian distribution, T-distribution and Laplace distribution to find the most appropriate signal distribution. In the second step, we perform cross-session transfer learning, in which we use the pre-trial baseline signals to generate a unique reference point for each participant. Then the points on the SPD manifold are shifted towards this reference point to overcome variability's in different observations. Two experiments were carried out, using two different channel configurations on five different frequency bands.
The rest of this paper is organized as follows; In section (II) we introduce the basic concepts of Riemannian geometry and transfer learning. In section (III) EEG signal analysis is performed using several statistical tests and two goodness of fit tests. In section (IV) our complete methodology is introduced. Results and discussions reported in section (V). In section (VI) we conclude and draw some perspectives of the work.

B. RIEMANNIAN DISTANCE
A non-singular square covariance matrix A ∈ R N ×N belongs to the set of symmetric positive-definite (SPD) matrices. Those SPD matrices form a connected Riemannian manifold Sym + N [30]. As the Euclidean distance does not consider the inner curvature of the manifold, it can't be used to measure the distance between two points A, B ∈ P(N ). The distance here can be defined as the length of unique shortest path connecting the two points. This path is called the geodesic from A to B [31], [32].
There are several metrics used to measure the geodesic distances, each of them is more or less suitable based on the application. For any two SPD matrices A and B, the Affine Invariant Riemannian Metric (AIRM) between them is defined as [30], [33]: where · F is the Frobenius norm of a matrix. One of the important proprieties of this Riemannian metric in Eq. 1 is that it's invariance to affine transformations by any invertible matrix D ∈ R N ×N , This property of Riemannian distance is called the congruence invariance, which means that, the distance between any two SPD matrices is invariant with respect to any linear invertible transform in the data space [24], [26]. This property will be used in the transfer learning process (Section IV-E).

C. MEAN OF SPD MATRICES
The geometric mean is a suitable descriptor for the center of mass of the points on the SPD manifold. The Riemannian center of mass of m elements A 1 , . . . , A m , called Karcher mean [32], [34] is defined as: with d(·, ·) is defined in Eq. (1). The notation argmin f (X ) means the point X 0 at which the function f reaches its minimum value. The minimum in Eq. (3) is obtained at a unique point G which represents the geometric mean and that forms the solution for the matrix equation: Eq. (4) has a closed-form solution only for m = 2, for three or more matrices, there is no closed form solution and iterative algorithms should be used [26], [30]. In Euclidean spaces large number of standard classifiers could be used, but they are not suitable in our case as the space of S n ++ is non linear. A very simple and efficient classifier called Minimum Distance to Riemannian Mean (MDRM), which is based on nearest neighbor classifier could be used.
Given l the set of all labeled classes l i ∈ (l 1 , l 2 , . . . , l k ), where k is the number of classes. During the training stage the mean for each class is generatedM(l i ). In the test stage, the geometric mean of the new observation is generated M. The distance between M and each class meanM(l i ) is computed. The new observation belongs to the class l according to the classification rule: whereM(l) is the Riemannian mean of class l k , M is the covariance matrix representing the mean of test observation, and l is the predicted class label of M. MDRM classifier works in the same way regardless of the data dimension (number of electrodes) and with any number of classes.

E. TRANSFER LEARNING
Zanini et al. [24] offered a modification over MDRM classifier, they called their new framework Riemannian alignment RA-MDRM. They dealt with cross session/subject variability as a geometric transformation (shift) of covariance matrices on the Riemannian manifold with respect to a reference state. When the brain is performing a specific task, covariance matrices are shifted in the same direction over the SPD manifold. Their method showed great improvement over normal MDRM in solving classification problem on motor imagery and event-related potentials datasets.
In RA-MDRM they first estimate the covariance matrix of the rest state. For motor imagery data the rest state is the recorded EEG data in the time window in which the participant is not engaged in the experiment, while in event-related potentials they used non-target stimuli as the rest state.
In RA-MDRM the covariance matrices representing the rest state is estimated (R n k ), where k is the number of covariance matrices in the rest state, n is the session number. Then compute the mean of those matrices (denoted as R, R (1) and R (2) represents the reference points for session 1 and 2 respectively). R is used as a reference point to reduce the cross-session/subject variability by performing the transformation: where C n i , and C n i are the i th covariance matrix in the n th session before and after shifting respectively.
Because of the congruence invariance propriety in Eq. (2) the transformation in Eq. (6) does not change the distance between the points that belong to the same session/subject.

III. EEG SIGNAL ANALYSIS
Covariance matrix estimation forms a vital and crucial step in manifold learning techniques, recently several researches tackled the problem of covariance matrix estimation from high dimensional data and from heavy tail distribution data. In [35] Ke et al. offered a method for estimating a stronger sample covariance matrix by introducing element-wise and spectrum wise truncation operators, and their M-estimator counterparts. Wei et al. [36] proposed an estimator of the covariance matrix under weak assumptions on the underlying distribution. In this work, we focus on understanding the underlying signal distribution. We use simple approaches for determining the most appropriate type of covariance matrix estimation technique by first determining the correct distribution of the EEG data.
In this section we analyse the EEG signals in DEAP dataset to determine the most appropriate distribution to which those signals belong. Based on the closest signal distribution we choose the most accurate technique for covariance matrix generation, we used two methods for covariance matrix generation (see section IV-D).
It is common when dealing with EEG signals to assume that, they exhibit Gaussian distribution. This assumption provides only a modest approximation for EEG data as a random variable [21]- [23]. In [21] J. Charles et al. performed Chisquare (χ 2 ) test, comparing EEG signals probability density function (pdf) against Gaussian distribution and Laplace distribution, they found that Laplace distribution proved to be more robust estimator. In [22] they also used Laplace distribution for better statistical reduction of multi-channel EEG data. Nazmia et al. [23] performed two Goodness-of-Fit tests Kolmogorov-Smirnov and Anderson Darling. They tested the EEG and Electromyography (EMG) signal distributions against Exponential distributions, Generalized Pareto distribution and Generalized Extreme Value distribution, they found that the Generalized Extreme Value distribution is the most appropriate distribution for describing the EMG and EEG signals.
In this work, we first perform several statistical tests on the EEG signals to determine whether they follow a Gaussian distribution or not. Then we use two goodness of fit tests Anderson Darling (A 2 ) and Watson (U 2 ) to compare the EEG signals distribution against Gaussian distribution, Laplace distribution and T-distribution.

A. OUTLIERS, SKEWNESS AND KURTOSIS TESTS
Outliers are points in data that differs significantly from other values they often indicate that the data has a heavy-tailed distribution and high skewness and that assuming a normal distribution is a modest assumption in this case.
In this part, we analyse the EEG signals (signals from all the 32 participants in DEAP dataset 40 trial each is used in this analysis) using different spatial window sizes to detect the percentage of outliers in each window-size. Through this work we define the outlier as a data value that is greater than three scaled median absolute deviations (MAD) away from the median. Leys et al. [37] stated that detecting outliers in data using median absolute deviation forms a more robust measure of data spread than using standard deviation around the mean. Scaled median absolute deviations for a random variable A, having N samples is defined as: where M is the median, b is a scaling factor and i = 1, 2, . . . , N . Then, Skewness and Kurtosis tests are used to describe the shape of the distribution and check for the non-Gaussianity of the EEG signals.
Skewness is a measure of the asymmetry of the probability distribution. Skewness close to zero is considered a symmetric distribution. A positive skewness implies a long left tail, which means that, the mass of the distribution is concentrated on the left part. A negative skewness on the other hand indicates a long right tail.
For measuring if the data are light-tailed or heavy-tailed relative to a Gaussian distribution Kurtosis is used. The kurtosis of the Gaussian distribution equals three. Kurtosis greater than three indicates that the data distribution have heavytail and Kurtosis less than three means that the data have light-tail.
Given a Random variable X having N samples and s standard deviation, Skewness and kurtosis are calculated as follows [38]: Table 1 shows the percentage of outliers in each window (window size varies from one to eight seconds), the percentage of signals with light-tail, heavy-tail,-long lefttail, long right-tail, median kurtosis, and median degree of freedom.
From Table 1 we can see that the existence of outliers increase by increasing the window size and their values are significant, they can not be overlooked and should be taken into consideration. The kurtosis values are higher than three, which means that the percentage of heavy-tailed signals are much more than light-tailed signals. The skewness value is mostly grater than zero around 52% of the signals have a TABLE 1. Percentage of outliers in each window (window size varies from 1 to 8 seconds), signals with long right-tail, long left-tail, light-tail, heavy-tail, median Kurtosis, and median degree of freedom.

FIGURE 1.
A comparison between EEG signal distribution against both T-distribution and Gaussian distribution. As an illustrative example we used the EEG signal from user number 19, trial 40, electrode F 3, using 4s window size.
long right tail and the rest of the signals (around 47%) have a long left tail. Testing for outliers, skewness and kurtosis show that the EEG signals have heavy-tails and does not follow Gaussian distribution.

B. GOODNESS OF FIT TESTS
In this section we perform two goodness to fit tests to compare the EEG signal distribution against two distributions that are close to Gaussian but has heavier tails. The T-distribution is a probability distribution that is similar to the Gaussian Distribution (GD) with a bell shape but has heavier tails, i.e. it tends to have values that exist far from its mean. The existence of heavier-tails could be detected by a parameter of T-distribution called the degree of freedom, the higher the degree of freedom the more the distribution becomes close to Gaussian distribution (median degree of freedom is shown in the last row in Table 1. T-distribution tends to have bigger values of Kurtosis and smaller values of degree of freedom than Gaussian distribution. Laplace Distribution [39] also known as a double exponential distribution also have a bell shape but it is very sharp in the center and is used to model symmetric data with long tails.

1) Anderson Darling goodness of fit test (A 2 ): Is based
on the cumulative probability distribution of data it is a modification of the Kolmogorov-Smirnov test and is known to gives more weight to the observations in the tails of the distribution. It is more sensitive to the existence of outliers and is better in detecting departure form normality specially in the tails of the distribution [40]. 2) Watson statistic test (U 2 ): Is suggested to be the most power full when testing for Laplace distribution against other symmetric distributions. This test is quite powerful and provides equal sensitivity to the tails as to the median of the empirical distribution function [41].
The Anderson Darling test and Watson test for Laplace distribution is performed using R package ''lawstat'' [42], while the Anderson Darling test for Gaussian distribution and T-distribution is performed using MATLAB statistical toolbox. Table 2 shows The percentage of EEG signals that fails to reject the null hypothesis at 0.05 significance level for the Gaussian distribution, T-distribution, or Laplace distribution using Anderson Darling test and Watson test. In Fig. 1 we show a comparison between EEG signal distribution against both Gaussian distribution and T-distribution using the Probability Density Function (PDF), Cumulative Distribution Function (CDF), and Quantile-Quantile plot (QQP). Results in Table 2 and Fig. 1 is generated from using 4s window size (the temporal window size that will be used in the emotion classification task). From Table 2 and Fig. 1 we can see that the EEG signal distribution is closer to T-distribution than Gaussian distribution and Laplace distribution.

IV. METHODOLOGY
A. DATASET DEAP dataset [43] is a multimodal dataset that is used in analyzing human affective state. The electroencephalogram (EEG) and other physiological signals of 32 individuals (16 males and 16 females, with average age 26.9) were recorded while each of them was watching 40 one-minute music videos. EEG signals were recorded at a sampling rate of 512 Hz using 32 active AgCl electrodes placed around the scalp according to the 10-20 international positioning system [44]. The dataset was recorded in two sessions separated by a break (20 observations per session) and in two different labs participants 1-22 were recorded in Twente and participant 23-32 in Geneva [43].
The DEAP dataset has two versions, the first one is the original without pre-processing, while in the second one EEG signals were down sampled to 128 Hz, the Electrooculography (EOG) artifacts were removed, the signals were filtered from 4 to 45 Hz. In the pre-processed version each observation (trial) is 63s, in which the first 3s are baseline signals before the participant starts to be engaged in the experiment. In this work, we are studying emotion classification based on EEG signals using the pre-processed version and the 3s baseline signals are used to generate the reference point for each participant.
We performed our experiment once on the 18 electrodes

B. EMOTION LABELING
In DEAP dataset, participants were asked to label each trial by giving a score between 1 and 9 to rate the levels of arousal, valence, liking and dominance for each of the 40 one-minute videos. We divided the two dimensional emotion plane (see Fig. 2) into four classes according to the scores given by participants to valence (V) and arousal (A).  In this work, we use score 5 as a threshold with is commonly used when working with DEAP dataset [7], [15]. Two different binary classification problems were introduced for subject-dependent emotion recognition: The discrimination of low/high arousal (LA/HA), and low/high valance (LV/HV). Table 3 shows emotion classification labels, the number of trials in the dataset that belongs to each label, and the average number of trials per participant belongs to each label. Since there are a balanced number of observations in each class accuracy can be used for performance measure.

C. TEMPORAL WINDOWING
The EEG signals during each trial were recorded for 63s (60s trials and a 3s pre-trial baseline), which is much longer than the time needed for recognizing emotion states. For identifying the emotion state, EEG signals are windowed into short segments. Thammasan et al. [45] tested emotion recognition accuracy with window duration that varies from 1 to 8 seconds their result showed that increasing the window size reduces performance. Mohammadi et al. [15] stated that the emotion hold time is between 2 to 4 seconds and found that their best performance was achieved using 4s window. Through this work we use a window size of 4s with 50% overlap during the emotion recognition stage and 1s window with 50% overlap during reference point generation (used for cross-session transfer learning process (Section IV-E)). Then, all the covariance matrices C j i (the i th covariance matrix in the j th trial) are shifted on the manifold towards this reference point (see section (IV-E)). Each trial is labeled (low valance/high valance (LV /HV ), and low arousal/high arousal (LA/HA)). The geometric mean for each trial (GM/Trial ) is generated. In the classification process, observations are divided into training set and testing set. From the training set, the geometric mean for each class (GM/Class) is generated. In the testing process the geometric mean of each test observation is generated. The classification process is performed using minimum distance to Riemannian mean (MDRM) classifier (see section (IV-F)).

D. COVARIANCE MATRIX ESTIMATION TECHNIQUES
The EEG signals recorded from N electrodes, each electrode data forms a time series x k (t) where k = 1, . . . , N . Each time domain signal is divided into small overlapping windows (in this work we use, 4s window with 50% overlap, this gives us 29 windows with 512 sample in each window). Let W ik refers to window i where i = 1, . . . , 29 coming from electrode k where k = 1, . . . , N . Each window is a vector containing n samples. Convolution is performed between each window and the corresponding windows coming from the N electrodes to generate 29 covariance matrices C i , i = 1, . . . , 29. We denote by X ∈ R N ×n a given EEG recording epoch recorded from N electrodes and having n samples per window. The covariance matrix C between N random variables is a square matrix that can be calculated from X, C ∈ R N ×N [30].
In section (III) we performed EEG signal analysis and clarified that EEG signals are corrupted with outliers and exhibit heavy tails. In this case using sample covariance matrix will offer a biased estimation. In section (III-B) we performed Anderson Darling goodness of fit test and Watson statistic test to clarified that the EEG signals are closer to T-distribution than Gaussian distribution or Laplace distribution. In this work we estimate the covariance matrix using two different method, sample covariance, and T-distribution covariance.
1) Sample Covariance: Assuming that the EEG signal distribution is Gaussian distribution ignoring the effect of outliers and heavy-tails, the sample covariance matrix it is given by: 2) T-distribution Covariance: By assuming the observed data follows multivariate Student's t distribution, the parameters (mean vector, covariance matrix, degree of freedom, . . . , etc) can be directly learned from the raw data via maximum likehood estimation (MLE). In [46] Rui Zhou et al. proposed an algorithm based on the generalized expectation maximization GEM) method to obtain the estimator. In this work, we generate T-distribution covariance using fit : mvt() function in R package fitHeavyTail [47]. In [48] Rui Zhou et al. offers a detailed explanation for covariance matrix estimation under heavy tail using fit : mvt() function.

E. CROSS-SESSION TRANSFER LEARNING
In this work we deal with the variability in EEG signals recorded by the same subject on two different sessions by finding a unique reference point for each subject and performing affine transformation for covariance matrices around this point. The variability between different sessions results from changes in electrodes positioning, environmental conditions, and subject physiological state.  E)). Each trial is labeled (low valance/high valance (LV /HV ), and low arousal/high arousal (LA/HA)). The geometric mean for each trial (GM/Trial ) is generated. From the training set, the geometric mean for each class (GM/Class) is generated and stored in the database. In the testing stage we use the observations recorded in the second session. For user N test trial J, the reference point from the test observation is generated R j and added to his reference point stored in the database(R N ) to generate his new reference pointR, then the covariance matrices in this test observation is shifted around this new reference point. The geometric mean for this test trial is generates M j . Using the participant geometric mean for each class stored in the database the classification process is performed using minimum distance to Riemannian mean (MDRM) classifier (see section (IV-F)).
In this work, we performed two experiments. In the first experiment (Fig. 3) we used the 3s baseline signals from the EEG data recorded during the two sessions together to generate a unique reference point for each participant. In the second experiment (shown in Fig. 4) we used data from one session for training and the second for testing. Each participant reference point is generated during training stage from the baseline signals in the training set, then during testing the 3s base line signal in test observation is used to adjust the participant reference point. A complete illustration for the proposed two experiments is given in section (IV-F). The effect of using a reference point and shifting the covariance matrices towards it by performing affine transformation helps in reducing the bias in the EEG data recorded in two different sessions. Fig. 5 shows each class data (in both sessions) before and after shifting and Fig. 6 shows the effect of all classes covariance matrices in both sessions before and after shifting.

F. EMOTION CLASSIFICATION
In this work, we are examining subject dependent emotion recognition, which means that the train and test observation of each subject is independent of other observations from other subjects. Two different binary classification problems were examined for subject-dependent emotion recognition: The discrimination of low/high valence (LV/HV), and low/high arousal (LA/HA). Let M (1) , l (1) , . . . , M (n) , l (n) be a training set of labeled observations. Where M (i) is the center of mass for observation i and l (i) is the corresponding emotion label for that observation, l (i) ∈ {HV , LV , HA, LA } in a certain frequency band.
Emotion classification processes was performed using the EEG data recorded from the 31 participants in DEAP dataset, participant number 20 was excluded as his class 3 (Low valance) observations exist only in session 2, there are no LV observations in the first session.  6)). Visualization obtained through t-SNE method using the Riemannian distance (Eq. (1)).

FIGURE 6.
Original covariance matrices of subject 1 (as an example) in both session 1 and session 2 before and after shifting (Eq.(6)). Visualization obtained through t-SNE method using the Riemannian distance (Eq. (1)).
During the training stage: 1) The first 3s (rest state) in each observation is divided into one second windows with 50% overlapping (5 windows, 128 sample in each window), 2) Covariance matrices are generated from each corresponding window 5 covariance matrices from each observation R j k , k is the number of covariance matrices in the rest state in each observation, j is the number of observations. VOLUME 10, 2022  3) The geometric mean (denoted asR) from R j k is generated.R represents the unique reference point for that participant. 4) Each participant 40 observations are divided to training set and testing set. 70% of the trials as used for training and 30% are used for testing.
5) The 60s left in each training observation is divided into 4s window with 50% overlapping. Consider C j i , the i th covariance matrix in the j th observation. 6) UsingR as the reference point and substituting in Eq. (6), affine transformation is performed on each covariance matrix in each training observation to overcome cross-session variability. 7) Each trial from the training set is labeled l (i) ∈ {HV , LV , HA, LA }; 8) The geometric mean for each training trial is generated M i , 9) The geometric mean (GM) for each emotion class is generated GM HV , GM LV , GM HA , GM LA from the training set.
During the testing stage (for each test observation T ): 1) The last 60s is divided into 4s window with 50% overlapping. Consider C T i , the i th covariance matrix in the T th test observation.

2) EXPERIMENT 2
In the second experiment (Illustrated in Fig. 4). We used observations in one session for training and the second for testing. Each participant reference point is generated during training stage from the baseline signals in the training set, then during testing the 3s base line signal in test observation is used to adjust the participant reference point. The training stage is the same as in Experiment 1 except that, only 20 observations recorded in the training session is used to generate the reference point and the geometric mean of each class.
During the testing stage (for each user N test observation T ): 1) The 3s baseline signal is used to generate a reference point from that test observation R T . This new reference point is used to adjust user N reference point stored in the database(R N ) to generate his new reference pointR.
2) The last 60s is divided into 4s window with 50% overlapping. Consider C T i , the i th covariance matrix in the T th test observation. 3) Using the new generated reference pointR and substituting in Eq. (6), affine transformation is performed on each covariance matrix. 4) The geometric mean for the test observation M T is generated, from the shifted covariance matrices ( C T i ). 5) Using the participant geometric mean for each class generated during the training stage classification process is performed using minimum distance to Riemannian mean (MDRM) classifier.

V. RESULTS AND DISCUSSION
In this work, we used MDRM classifier with transfer learning for subject-dependent emotion recognition based on EEG signals. Two pre-processing steps were performed. In the first step, the EEG signals were analysed by performing several statistical and goodness of fit tests. We found that T-distribution is most appropriate for describing EEG signals distribution. The signal analysis step helped us in determining the most appropriate covariance matrix estimation technique. Two covariance matrix estimation techniques were examined, Sample covariance and T-distribution covariance. In the second step, cross-session transfer learning process was performed to overcome the variability in the EEG data recorded in different sessions. Classification was performed using MDRM classifier. Accuracy is used as an index of classification performance. It is considered as a suitable index as, the average number of observations/participant belongs to each label is almost balanced (see Table 3). Three methods were tested: using sample covariance without performing transfer learning process, using sample covariance with performing transfer learning, and finally using T-distribution covariance with performing transfer learning.
Two different experiments were performed for reference point generation. The result for the first experiment (see Table 4) achieved by performing five fold cross validation and averaging the results. In each fold we fuse the observations of each participant recorded during the two sessions, label the data to four classes (section IV-B), shuffle the observations belongs to each class, divide each class observations to 70% train and 30% testing and we made sure that the training data and the test data are entirely disjointed. The result for the second experiment (Table 5) achieved by using the first session for training and the second for testing and then using the second for training and the first for testing and averaging the results.
From Table 4 and 5 we can see that, using transfer learning process improved the results even with sample covariance estimation technique and combining T-distribution covariance with transfer learning process showed better performance in all frequency bands. Theta frequency band gave the best performance 87.1% for valence, 86.3% for arousal in experiment 1 using 18 channels, 88.78% for valence, 86.37% for arousal in experiment 1 using 32 channels, 76.71% for valence, 75.35% for arousal in experiment 2 using 18 channels, and 80.11% for valence, 79.74% for arousal in experiment 2 using 32 channels. It is clear that generating the reference point from the two sessions (Experiment 1) gave better results than generating the reference point from training session and using minimum information from the test observation to adjust this reference point. Also, using 32 channels placed around the entire scalp gave better results than using only 18 channels placed over the upper half of the scalp. In Table 6 we show the results for each of the 31 participants generated in experiment 1 using 18 channels.
In Table 7 the percentage of improvement offered by using transfer learning and by using T-distribution covariance is shown. Using transfer learning as a pre-processing step improved the results even if the covariance estimation is performed using sample covariance.

VI. CONCLUSION
In this work, we present a scheme to improve the accuracy of MDRM classifier. We build subject-dependent EEG emotion recognition system based on MDRM classifier, with adding a two steps pre-processing stage.
In the first step, we analyze the EEG signals to investigate their non-Gaussianity, we found that the signals are corrupted with outliers, exhibit heavy tails, and that T-distribution is the closest to the actual EEG signals distribution. Based on the previous finding, the covariance matrix estimation was performed. In the second step, we performed cross-session transfer learning by generating a common reference point for each participant from his rest-state.
Performing cross-session transfer learning improved the system performance even when using sample covariance estimation, while combining T-distribution covariance with transfer learning gave the best results.The proposed preprocessing steps could be used in any brain computer interface system based on SPD manifold learning techniques. EMAN A. ABDEL-GHAFFAR was born in Cairo, Egypt, in 1976. She received the B.Sc., M.Sc., and PhD. degrees in electrical engineering from the Faculty of Engineering Shoubra, Benha University, Cairo, Egypt, in 1999, 2004, and 2010, respectively. She had her M.Sc. in studying image compression techniques, and studied multi-modal biometric fusion in her PhD. She is currently working as a Lecturer at the Faculty of Engineering Shoubra, Benha University. Her current research interests include biometrics, brain-computer interaction, data compression, multimedia security, speech processing, and image processing.