Dictionary Learning of Symmetric Positive Definite Data Based on Riemannian Manifold Tangent Spaces and Local Homeomorphism

Symmetric Positive Definite (SPD) data are increasingly prevalent in dictionary learning recently. SPD data are the typical non-Euclidean data and cannot constitute a Euclidean space. Therefore, many dictionary learning algorithms cannot be directly adopted on SPD data. Reproducing Kernel Hilbert Spaces (RKHS) is now commonly used to deal with this problem. However, RKHS is an infinite-dimensional Hilbert space, rather than a Euclidean space, resulting in the inability of the dictionary learning to be directly used on SPD data. In this paper, we propose a novel dictionary learning algorithm for SPD data, which is based on the Riemannian Manifold Tangent Space (RMTS). Since RMTS is based on a finite-dimensional Hilbert space, i.e., Euclidean space, most machine learning algorithms developed on Euclidean space can be directly applied to RMTS. The introduction of RMTS provides a better mathematical platform for machine learning of non-Euclidean data. The proposed RMTS method first selects a point on a Riemannian manifold as an anchor point. It transforms the other points on the Riemannian manifold into tangent vectors of the geodesics between these points and the anchor point, with the tangent vectors at the anchor point. We set the length of the tangent vector equal to the length of the geodesic. As a result, such tangent vectors have an explicit geometric meaning, such as direction information, while the RKHS method may cause some geometric meaning to be lost in the original data during the mapping process. In addition, the proposed algorithm adds a regular term of local homomorphism between SPD data and its RMTS dictionary coding coefficients, so that the similarity of SPD data and its RMTS dictionary coding coefficients are as close as possible. Experimental results on four public datasets demonstrate that the proposed algorithm significantly outperforms five state-of-the-art algorithms. This work opens a new pathway towards SPD data dictionary learning methods.


I. INTRODUCTION
In recent years, with the rapid development of machine learning technology, an increasing number of non-Euclidean data are generated. They are performed on the Riemannian manifolds, such as SPD manifolds, Steiffel manifolds [1] and Grassmann manifolds [2]. SPD data, a typical type of non-Euclidean data, are generally obtained by extracting regional covariance descriptors from the original data. While they provide a nonlinear representation of the original data and possess rich geometric characteristics, they also offered an outstanding performance in many applications, such as 3D object recognition [3], visual surveillance [4], object recognition [5], action recognition [6], medical imaging [7], manual detection and tracking [8]. The whole SPD matrices form an SPD manifold. Moreover, the SPD manifold is a nonlinear space, instead of a Euclidean space. However, as traditional machine learning algorithms are typically developed on the Euclidean space, they cannot be directly applied to the SPD manifold.
Dictionary learning and sparse representation (DLSR) is a powerful tool to represent data, which becomes increasingly prevalent in the fields of signal processing [9] and computer vision [10]. Given a set of raw data   1N x , ,x = X , using the DLSR, it can be considered that they can be effectively represented by a linear combination of a set of dictionaries   1D e , ,e = E . The combination coefficient is called sparse representation, i.e.,   1N A a , ,a = , where the sparseness means that the fewer non-zero vectors of this combination coefficient, the better. Therefore, for a given training dataset, DLSR algorithms aim to learn a set of ideal dictionaries. Then we can use the dictionaries to perform sparse coding for each queried data while minimizing the reconstruction error. Although the research on SPD manifold-based DLSR algorithms has attracted much research attention recently, the progress is still limited. DLSR is essentially a sparse linear approximation that requires a large number of linear operations. However, the SPD manifold is not a linear space. If directly applying traditional DLSR algorithms to it, the nonlinear geometry of the original SPD will be ignored, resulting in poor sparse coding.
The nonlinear problem of the SPD manifold is the key to applying DLSR algorithms to SPD data. To solve this problem, there are mainly two approaches. The first approach is to solve the specific problem of DLSR by building a model directly on the original SPD manifold, i.e., both the original data and the learned dictionaries are SPD matrices. However, the linear combination of a set of SPD matrices cannot guarantee that it is still an SPD matrix. It requires many restrictions, which makes dictionary learning difficult. The second method is to map the original SPD data into the Hilbert space and perform specific machine learning tasks on the Hilbert space to solve the nonlinear problem of the SPD manifold. Currently, the main Hilbert spaces are RKHS and tangent spaces.
Nowadays, Riemannian manifolds are prevalent in machine learning applications. Riemannian manifolds are not Euclidean spaces, and most existing machine learning algorithms are developed in the Euclidean space. Therefore, these machine learning algorithms cannot be directly applied to the Riemannian manifold. Currently, the commonly-used processing method is the RKHS method. The RKHS method first transforms the data on the Riemannian manifold into RKHS by using the reproducing kernels, and then performs machine learning on RKHS. In this paper, we explore applying DLSR on the SPD data, and primary contributions are as follows: 1) We propose a machine learning method for non-Euclidean data based on Riemannian Manifold Tangent Spaces (RMTS). It converts data on Riemannian manifolds into RMTS and then performs machine learning on RMTS. Compared with the RKHS method, RKHS is an infinite dimensional Hilbert space instead of Euclidean space, while RMTS is a finite dimensional Hilbert space, which is isomorphic to Euclidean space. Almost all machine learning algorithms developed based on Euclidean space can be directly applied to RMTS. 2) The proposed RMTS method preserves the geometric relationship between the data on the Riemann manifold. The RKHS method converts the data on the Riemann manifold into a nonlinear function defined on the Riemann manifold, and the form and properties of the data change significantly. However, the RMTS method converts the data on the Riemann manifold into a tangent vector of reference points on the Riemann manifold. The tangent vector starts from the reference point to the other point. 3) We propose to apply the RMTS method to dictionary learning of SPD data. Sparse linear approximation is the most commonly used operation in dictionary learning. However, SPD data do not support linear operations. Therefore, the dictionary learning of SPD data cannot be performed directly on SPD data, or even expressed. Here we apply the RMTS method to dictionary learning of SPD data. We first transform the samples of SPD data dictionary learning into RMTS, and then construct the dictionary learning model on RMTS. 4) An SPD data dictionary learning algorithm based on RMTS and local homeomorphic regularization (RLH-SDL) is proposed. We add a local homeomorphic regular term between SPD data and their dictionary coding coefficients, and ensure that the similarity between SPD data should be similar to the similarity between their dictionary coding coefficients as much as possible. The remainder of this paper is organized as follows. Section 2 introduces the notations. Section 3 reviews the representative studies of dictionary learning. Section 4 explains the proposed SPD data dictionary learning framework based on SPD Riemannian manifold tangent space and local homeomorphism (RLH-SDL). Section 5 compares the performance of five different algorithms. Section 6 discusses the experimental results on four public datasets. Finally, Section 7 concludes this paper.

A. Notations
We first introduce the notations in this paper, as shown in Table 1.

B. Differential Manifolds
In general, the n -dimensional topological manifold is a topological space that is locally homeomorphic to the ndimensional Euclidean space n R . If M is an n -dimensional topological manifold, then M must have a It can be further proved that the dimension of ( ) p TM is the same as that of the differential manifold M . Therefore, ( ) p TM is a finite-dimensional linear space, and all finitedimensional linear spaces are isomorphic with the samedimensional Euclidean space. Therefore, we call the tangent spaces of differential manifolds are all Euclidean spaces.

C. Riemannian manifolds
There is no distance metric and linear operation between the elements of the differentiable manifolds. Although the tangent space of the differential manifold supports linear operation, there is no distance metric between the tangent vectors. In order to support machine learning algorithms, the differential manifold must be assigned a Riemannian metric, making it a Riemannian manifold. The so-called Riemannian metric is a symmetric, positive definite, and smooth second-order tensor field on the differential manifold. Since the symmetric positive definite second-order tensor field defines the inner product for the tangent space of each point of the differential manifold, the Riemannian metric can also be regarded as a smooth inner product field on the differential manifold. The Riemannian metric can be used to define the length of the smooth curve between any two points on the Riemannian manifold. Then it can define the distance between any two points, that is, the length of the shortest smooth curve connecting the two points. Such a curve is called a geodesic, and such the distance is called the geodesic distance.
As explained above, the tangent space of each point on the differential manifold is a finite-dimensional linear space. In addition, the Riemannian metric is to define the inner product for the tangent space of each point on the differential manifolds. Therefore, the tangent space of each point on the Riemannian manifold is finite-dimensional inner product space. Furthermore, finite-dimensional inner product spaces are complete. Therefore, the tangent space of each point on the Riemannian manifold is a finite-dimensional Hilbert space. The finite-dimensional Hilbert space is isomorphic with the same-dimensional Euclidean space endowed with an inner product (all the following Euclidean spaces refer to the Euclidean space endowed with the inner product). Therefore, the tangent space of each point on the Riemannian manifold is essentially the Euclidean space. In addition, the distance SPD matrices, given a certain topological structure, can be a differential manifold, and given a Riemannian metric, it can be a Riemannian manifold Greek capital letter denotes the tangent vector of SPD manifolds defined by the Riemannian metric (inner product) is the Euclidean distance.

D. Dictionary Coding and Learning
A dictionary refers to a set of basic data   1D E , ,E , and the function of dictionary coding is to make the data X approximated by a sparse linear combination of this set of dictionaries. The objective function of dictionary coding is: where ( ) sparse  is the sparse regular term of dictionary coding. It represents that the optimal linear approximation requires the least non-zero component of the approximation coefficient  . We usually use the 1-norm to represent the sparse regular term [9]. The ( ) sparse  can be expressed as follows: It is should be noted that the sparse regular term ( ) sparse  is the feature of dictionary coding. Otherwise, dictionary coding learns the subspace. Therefore, we aim to find a data Z in the linear subspace   1D span E , ,E , which is formed by the basic data   1D E , ,E , and make it the closest to the given data X as follows: The prerequisite of dictionary coding is suitable dictionaries. The dictionary can be learned from the given samples. Based on different applications, we collect a certain number of fully representative samples, and use these samples for dictionary learning [10]. Let is the dictionary coding matrix. In the above objective function, the dictionary coding matrix A is a by-product, which is not necessarily required, the dictionary   1D E , ,E is needed.

III. Related Work
With the successful application of SPD data in various fields of machine learning, the related study of SPD data DLSR has attracted much attention. Conventional DLSR algorithms are typically developed on Euclidean space. However, with the continuous development and in-depth research of machine learning, more and more data belong to non-Euclidean data.
In recent years, the research of DLSR algorithms based on SPD manifolds has attracted much attention. For the nonlinear problem of SPD manifolds, there are two main solutions: 1) to build the DLSR model directly on the SPD manifold, and 2) to map the original SPD data to Hilbert space.

A. DLSR Modeled Directly on SPD Manifold
DLSR is essentially a sparse linear approximation, which requires a large number of linear operations. However, the SPD manifold is not a linear space. If traditional DLSR algorithms are directly performed on the SPD manifolds, it ignores the nonlinear geometric structure of the original SPD data and also leads to poor sparse coding. Therefore, appropriate linear decomposition and reconstructtion error metrics for SPD matrices should be introduced. In general, if we want to employ the linear combination of atomic matrices to represent SPD matrices, we need to choose the appropriate metrics to measure the reconstruction errors, such as Logdet divergence and Frobenius norm. Sivalingam [13] proposed a tensor sparse coding (TSC) algorithm for positive definite matrices. This algorithm strictly limits each coefficient vector to be nonnegative, and ensures that the sparse coding result is also an SPD matrix. Finally, it uses the defined Logdet divergence to measure the coding errors. In addition, based on TSC, Sivalingam further proposed a new tensor dictionary learning (TDL) algorithm [14]. Like TSC, Sivalingam used the Logdet divergence to measure the coding errors in the dictionary learning process.
Although this approach avoids the extra calculation caused by mapping SPD data into other linear spaces. Modeling on SPD manifolds would strictly limit each coding coefficient to be non-negative in the dictionary learning process. Therefore, it would impose a significant burden on dictionary learning and greatly increase the computational complexity. In addition, the Logdet divergence is not the Riemannian geodesic distance, and it cannot accurately measure the distance between two SPD matrices. Therefore, it is not accurate to use Logdet divergence to measure coding errors.
In addition, there have been many attempts to use other metrics to measure the reconstruction error. Cherian et al. [15] used the method of Riemannian geometry to learn the dictionary. They ensured that the dictionary was composed of SPD matrices, and used affine invariant Riemannian metric (AIRM) to measure the coding errors. AIRM can be a good measure of the distance between two SPD matrices. However, their objective function of dictionary learning is a nonconvex function that leads to coding errors. Sra et al. [16] used the Frobenius norm to measure the similarity between SPD matrices in the generalized dictionary learning (GDL) algorithm. They ex-pressed each SPD matrix as a linear combination of the atomic matrices with rank 1. However, it is not a good metric since the Frobenius norm ignores the Riemannian structure of the SPD manifolds.

B. SPD Manifold DLSR Methods Mapped to Hilbert spaces
Recently, when applying DLSR algorithms on SPD data, most of the algorithms map the original SPD data into the Hilbert space to solve the nonlinear problem of SPD manifolds. At present, the main Hilbert spaces are RKHS and tangent spaces.
It is well known that RKHS is a complete inner product space. We can define a kernel function to generate a unique RKHS. Harandi et al. [9] derive Stein kernel function and Jeffery kernel function. They further mapped the SPD data into RKHS by these two kernel functions and directly performed DLSR on RKHS. Li et al. [17] defined the special matrix logarithmic multiplication and scalar logarithmic multiplication on the original SPD manifolds. Then, they derived three kernel functions for DLSR. Asha et al. [18] applied DLSR of the SPD data to a new field of breast tumor grading. They mapped the SPD data into RKHS by the Gaussian kernel function based on the Log-Euclidean metric, Stein divergence, and Jeffery divergence respectively for DLSR. This is the most sophisticated breast cancer-grading algorithm in terms of both quantitative and qualitative analysis.
However, neither the kernel function defined by Harandi nor the kernel function defined by Li, they do not contain data dependencies. They have no parameters to learn and have little correlation with the original SPD data. Once the kernel function is selected, the mapped RKHS will not change with the change of SPD data. Therefore, the connection between the original data and RKHS is not close, and it cannot achieve the best coding results.
Some studies preserve the original SPD data geometry [19][20][21][22]. Li et al. [19] proposed a semantic and neighborhood preservation dictionary learning method. They added a regular term to the objective function based on the Log-Euclidean kernel function, and applied the Graph Laplace smoothing operator to it. Although this method fails to generate a kernel function containing data dependencies, the coding coefficient vector retains the similarity between the original SPD data. Wu et al. [20] used the Belkin kernel function framework defined in [21] and the kernel function defined by Li as the basic kernel function, to propose two new kernel functions containing data dependencies by the Graph Laplace smoothing operator. Zhuang et al. [22] used a Gaussian kernel function as the basic kernel function based on the Log-Euclidean metric. Then they used a symmetric semi-positive definite matrix as a data dependency. However, the kernel function proposed in [21,22] is too complex and the learning of the data dependency will greatly increase the computational complexity of the algorithm.
Many approaches have been proposed for tangent spaces. Zhang et al. [23] used the Log-Euclidean of SPD matrices to obtain vectorized covariance features for sparse representation. Guo et al. [24] trans-formed the SPD Riemannian manifold into the tangent space by the method of matrix logarithmic mapping. Each mapped SPD matrix can then be represented by a linear combination of other mappings in the tangent space. Yuan [25] et al. mapped the SPD manifold into the tangent space to calculate the sparse representation of the SPD data and applied it to the action recognition. Since the tangent space of the SPD manifold is linear, mapping the SPD matrix data into it can also effectively solve the nonlinearity problem of SPD manifolds.
However, the mapping functions of the above three studies will not only bring additional computation, but also ignore the form of the mapped data. In this way, their tangent spaces only retain the local geometrical structure of the original SPD matrix data, while the characteristics of the SPD matrix data are fundamentally lost. Moreover, they have no further measures in the process of DLSR, resulting in unsatisfactory sparse coding results. The tangent space of the SPD Riemannian manifold proposed in this paper is a symmetric matrix space. Therefore, the data properties of the original SPD data can be retained to the utmost extent while effectively solving the nonlinearity problem of the SPD manifold.

IV. Dictionary Learning of Symmetric Positive Definite Data Based on Riemannian Manifold Tangent Spaces and Local Homeomorphism (RLH-SDL)
SPD data is the most popular non-Euclidean data in machine learning today. Since the whole SPD data cannot constitute a linear space, and dictionary learning is formulated by linear operations. The dictionary learning cannot be directly performed on the SPD data. At present, the common method is to transform SPD data into RKHS, and perform dictionary learning on RKHS. However, the SPD matrix becomes a function defined on the SPD data after SPD data transforming into RKHS. The form and nature of the data have changed significantly.
Although the entire SPD data cannot form a linear space, it can form a Riemannian manifold (hereinafter referred to as the SPD manifold). In addition, the tangent space of the Riemannian manifold is a finite-dimensional Hilbert space. In particular, the tangent space of the SPD manifold is a symmetric matrix space. Since the symmetric matrix includes the symmetric positive definite matrix, the SPD manifold can be regarded as a sub-manifold of the symmetric matrix space. The symmetric matrix space can be also regarded as the smallest linear space containing the SPD manifold. We propose to first transform the SPD data into a symmetric matrix space, and then perform dictionary learning on the symmetric matrix space. Subsequently, the dictionary is also applied on the symmetric matrix space.

A. The Transform from the Riemannian Manifold to its Tangent Spaces
Let D Sym ++ denote the SPD manifold.     is the eigenvalue of X . In addition, U is an orthogonal matrix composed of the orthonormal eigenvectors of X . The log is defined as follows: Some studies also use log transformation, but it seems that they did not mention the following two characteristics: (1) The log transformation is an isometric transformation under the affine invariant Riemannian metric. That is, the geodesic matrix between X and the identity matrix is equal to the Euclidean matrix between ( ) log X and the origin of the tangent space The geodetic distance derived from the affine invariant Riemannian metric is as follows: Then we have: (2) The log transforms the SPD matrix into a symmetric matrix. It keeps the form of the symmetric matrix of the data, and only performs log transformation on the eigenvalues of the SPD matrix. Since log is a monotonically increasing function, it thus maintains the order of the eigenvalues of the SPD matrix. Furthermore, the log changes the SPD matrices the least in form and nature.

B. Dictionary Learning of SPD Data Based on Riemannian Manifold Tangent Spaces
Given a set of dictionary learning samples   D 1N X , , X Sym ++  on the SPD manifold. We first transform these samples into the tangent space as: log X , ,log X Sym  . Then we use the linear combination of these tangent vectors to construct the dictionary in the tangent space of the SPD manifold as follows: where M is the number of dictionaries. Then we derive the following expression of the dictionary generated matrix B : Apparently, the dictionary   1M E , ,E is completely determined by the dictionary generated matrix B in the tangent space D Sym . Therefore, the model of dictionary learning based on SPD manifold tangent space is as follows: A is the dictionary coding matrix, and iRow A is the row vector of A , representing the dictionary coding of ( )

C. Local Homeomorphism between SPD Data and Their Dictionary Coding
According to the mathematical definition of manifolds, the manifold is a topological space that is locally homeomorphic with the Euclidean space. In the practical applications, the so-called manifold refers to the SPD manifold, and the Euclidean space locally homeomorphic with the SPD manifold is the tangent space of each SPD matrix.
where SP 0   , MR 0   . In the RLH-SDL, there are two optimization variables: dictionary coding matrix A and dictionary generated matrix B . The dictionary generated matrix B is necessary, and the dictionary coding matrix A is a by-product. From the dictionary learning point of view, it is not necessary. However, the dictionary encoding matrix is indispensable in order to compute the dictionary generated matrix B .

2) OBJECTIVE FUNCTION
Then Furthermore, we have the following: Finally, the objective function of RLH-SDL is as follows:

E. Solution to RLH-SDL
In the objective function of RLH-SDL, there are two optimization variables: the dictionary coding matrix A and the dictionary generated matrix B . We use an alternating and iterative method to solve A and B sequentially. First, we fix the dictionary generated matrix B and calculate the dictionary coding matrix A . Then, the objective function can be simplified as follows: We use Feature-Sign Search Method [26] to solve it. Then, we fix the dictionary coding matrix A and calculate the dictionary generated matrix B . The objective function can be simplified as follows: We take the partial derivative of Eq. (27) as follows: The above procedure is repeated until the objective function is smaller than the preset threshold.
Algorithm 1 summarizes the main flow of the RLH-SDL algorithm.

F. Complexity Analysis
We first transform the SPD data into the tangent space of the SPD manifold, i.e., a symmetric matrix space, and then performs dictionary learning on the symmetric matrix space. The subsequent applications of the dictionary are also carried out in the symmetric matrix space. In this subsection, we analyze the computational complexity of RLH-SDL. Given a set of dictionary learning data  

V. Comparison with State-of-the-Art Algorithms
In this section, we introduce five state-of-the-art algorithms as comparisons to the proposed algorithm.
Since the whole SPD matrix cannot form a linear space in the real number field. However, dictionary encoding is a sparse linear encoding of dictionaries. Therefore, dictionary learning cannot be performed directly on SPD data. Currently, the common approach is to transform the dictionary learning samples into RKHS and perform dictionary learning and application on RKHS. Nevertheless, RKHS can be generated uniquely by a kernel function, and then this kernel function is called the reproduction kernel function of this RKHS. Therefore, different kernel functions imply different RKHSs. In addition, SPD data can form Riemannian manifolds under certain topologies and Riemannian metrics. Therefore, the transformation of SPD data into RKHS can be regarded as the transformation from Riemannian manifolds to RKHS.
As for RSR [9], SNP [19], KLE-DLSC [17], and MKSR [20], they all transform dictionary learning from SPD manifolds to RKHS, but they use different Riemannian metrics and kernel functions. The kernel function used by RSR is as follows: ,Y . This sample-dependent kernel function is proposed and proved by [17].
RSR, SNP, KLE-DLSC, and MKSR are all performed on RKHS. However, our algorithm first transforms the SPD data from the SPD manifold into the tangent space of the SPD manifold, i.e., the symmetric matrix space. Compared with the dictionary learning on RKHS, our algorithm retains the original form and nature of the SPD data to the greatest extent.
Because the linear combination of SPD matrices cannot be guaranteed to be an SPD matrix, the entire SPD data cannot constitute a linear space. However, if the combination coefficient is limited to non-negative or at least one combination coefficient is guaranteed to be positive, the linear combination of SPD matrix is still an SPD matrix. Riem-DLSC [15] proposed a dictionary learning algorithm on SPD manifolds under the condition of non-negative linear combination as follows: is a non-negative matrix. There is at least one positive number in the same rows of A .
( ) geoD , •• is the geodetic distance of the SPD manifolds. The geodesic distance is determined by the Riemannian metric, and Riem-DLSC uses the affine invariant Riemannian metric [27]- [29].
Riem-DLSC performs dictionary learning directly on the SPD datasets. It defines the non-negative linear combination operation and this increases the computational complexity.

VI. Experimental Results
Here, we present the experimental results of RLH-SDL and the five state-of-the-art algorithms presented in Section 5 on four public datasets. In the experiment, we first divided the data into two parts: A and B. A is used to learn the dictionary. In the process of dictionary learning, we also calculate the dictionary coding of A at the same time. After obtaining the dictionary, we use the obtained dictionary to encode B. Then, the dictionary coding of A is used as the learning samples of the classifier, while the dictionary coding of B is used as the testing samples of the classifier. Thus, the obtained classification accuracy of B is used as the measure index of the performance of each algorithm. Since the focus of this paper is not classification algorithms, we only use simple classifiers: the Ridge Regression (CRR) and the K-Nearest Neighbor (KNN) classification algorithm.

1) DATASET DESCRIPTION
The QMUL dataset [30] is the human head images collected by airport terminal cameras. The entire dataset contains 20005 images, and the original images is 50 × 50 pixels. The original dataset has been divided into two parts before processing. The first part is the training dataset, with a total of 11,280 images, and the second part is the testing dataset, with a total of 8725 images. From different shooting angles of human heads, the entire dataset is divided into five subsets: 1) 'back' represents the images taken from the back of the head, 2) 'background' represents the background images, 3) 'front' represents the images taken from the front of the head, 4) 'left' means the images taken from the left side of the head, and 5) 'right' means the images taken from the right side of the head. Some sample images are shown in Figure 1. The number of datasets in each subset is shown in Table 2.  Each image is 50 × 50 pixels in the original QMUL dataset. In our experiment, we first generate a 13dimensional vector for each pixel of the image as follows:  x, y . Therefore, a 50×50 image generates a 2500 13-dimensional vector group, and the covariance matrix of this vector group is the SPD matrix of the image.

2) THE EFFECT OF THE NUMBER OF DICTIONARIES ON RLH-SDL
The dictionary learning selects or reselects a certain number of dictionaries from the original dataset, and uses the selected dictionary to perform the sparse linear representation of the original data. Therefore, the number of dictionaries directly affects the result. If the number of dictionaries is too small, some data information may be lost and cannot be expressed effectively. If the number of dictionaries is too large, the computational complexity will be too high.
To explore the effect of the number of dictionaries on RLH-SDL, in our experiments, we randomly select 200 images from each subset of the training dataset as the training samples for dictionary learning. Then, we randomly select 300 images from each subset of the test dataset. Where we select 200 images as training samples for the classifier and 100 images as testing samples for the classifier to calculate the classification accuracy. The training and testing of the classifier are randomly repeated 10 times, and the average value of the classification accuracy is used as the final experimental result. Here we use two classifiers to evaluate the accuracy of the algorithms, i.e., the ridge regressionbased classifier (CRR) and the K-nearest neighbor (KNN) classification algorithm.
For RLH-SDL, the regular term  Figure 2 shows the effect of the number of dictionaries on classification accuracy of RLH-SDL. We observe that the classification accuracy of the two classifiers increases rapidly with the increase of the number of dictionaries. However, when the number of dictionaries reaches 50, the classification accuracy saturates. Figure 3 shows the effect of the number of dictionaries on the running time of RLH-SDL. We found that if the number of dictionaries is continuously increased, the computational complexity of the algorithm keeps increasing. However, the accuracy of the algorithm does not change much, so the gain is not worth the loss.    As can be seen from Table 3, the classification accuracy of Riem-DLSC, which directly models the SPD manifold, is much lower than that of the other algorithms. Among the algorithms modeled by RKHS, RSR-S with the Stein kernel function achieved the highest accuracy on the CRR classifier, while RSR-J with the Jeffery kernel function had the highest accuracy on the KNN classifier. KLE-DLSC with three kernel functions based on the LE kernel framework performs poorly on the QMUL dataset, only outperforming Riem-DLSC. The MKSR algorithm with data dependency and the SNP algorithm perform close to RSR, but still worse than RSR.

3) EXPERIMENTAL RESULTS OF RLH-SDL AND COMPARISON ALGORITHM
Compared with the other five algorithms, the proposed RLH-SDL algorithm achieves the highest classification accuracy on the QMUL dataset. With the CRR classifier, the accuracy of RLH-SDL is 2.5% higher than RSR-S, while it is 2.4% higher than RSR-J with the KNN classifier. This indicates that the RLH-SDL is more suitable for QMUL dataset. The local homeomorphism can preserve the similarity between the original data, so the sparse coding results can better preserve the geometric structure of the original dataset. In summary, we conclude that the RLH-SDL algorithm is effective on the QMUL dataset.

1) DATASET DESCRIPTION
The original ETHZ dataset [31] is captured using a mobile camera, which provides a series of variations in human appearance. The dataset is divided into 3 sequences, Where each sequence contains 4,857 images of 83 pedestrians, 1,936 images of 35 pedestrians, and 1,762 images of 28 pedestrians, respectively. Figure 4 shows some example images. Each image has 640 480  pixels. To convert the original image to SPD data with the regional covariance descriptor, we first sample all images into 64 32  pixels, and then generate a 17-dimensional vector for each pixel as follows:  17dimensional vectors. The covariance matrix of the vector group is the characteristic matrix of this image, and the characteristic matrix is an SPD matrix.

2) THE EXPERIMENTAL RESULTS
In the experiment, we combine all three sequences of ETHZ into an overall dataset. It contains 146 pedestrians and 8555 pictures. For each pedestrian, we randomly select 20 pictures as learning samples of the dictionary. The rest are used as testing samples of the classifier. The regular item MR  is set to 0.1 and the sparse item SP  is set to 0.5. The samples will be randomly selected for 20 times in the experiment, and the average of the results will be the final result. Table 4 shows the experimental results.
As can be seen from Table 4, the Riem-DLSC algorithm, which directly models the SPD manifold, performs the worst, while the RSR-J with Jeffery kernel function performs the best among the five compared algorithms. The KLE-DLSC-Gaussian algorithm using the LE kernel framework, the MKSR algorithm using the Belkin kernel function framework and data dependence, and the SNP algorithm using the logarithmic Euclidean kernel function and Graph Laplacian regularization perform similarly, but they all perform poorly. The difference with RSR-J is clear.

1) DATASET DESCRIPTION
The original COIL dataset [32] is a common dataset for object recognition tasks. It contains a total of 1440 grayscale images, and each image in the original dataset is 128 × 128 pixels. These images are from 20 objects. Since each object is captured on a turntable with a 360-degree rotation, and the images of each object are separated by 5 degrees. Therefore, there are 72 images for each object. Figure 5(a) shows the 20 objects and Figure 5(b) shows the 72 images of an object.
(a) (b) For each pixel ( ) x, y of the image, we define the 15dimensional characteristic vector. The generation of it is as follows: 2  x  22  x  y  x  y  1  8  2  y   I  x,  y,I  x,  y  ,  I  ,  I  ,  I  I  ,

2) THE EXPERIMENTAL RESULTS
In our experiments, we randomly select 20 images from each object as training samples for dictionary learning. The remaining 52 images are used as test samples for classification. The training and test samples were randomly selected 20 times. We calculate the classification accuracy after each selection, and the average of the results is taken as the final result. The regular term MR  is set to 0.1, and the sparse term SP  is set to 0.5. Table 5 shows the classification accuracy of each dictionary learning algorithm. We can see that RLH-SDL achieves the best classification accuracy. However, the accuracy of Riem-DLSC, which is directly modeled on the SPD manifolds, is inferior to the other comparison algorithms. For the methods of modeled on RKHS, the accuracy of the SNP algorithm with Log-Euclidean kernel function and Graph Laplacian regularization achieved the highest accuracy on the CRR classifier. While the accuracy of the RSR-J algorithm with Jeffery kernel function reaches the highest on the KNN classifier. The RSR-S algorithm with the Stein kernel function, the KLE-DLSC-Gaussian with the LE kernel function framework, and the MKSR with the Belkin kernel function framework and data dependencies perform relatively poorly. In this experiment, we select 5 subsets from the dataset according to the different poses of individuals, i.e., PIE05 for the left pose, PIE07 for the upward pose, PIE09 for the downward pose, PIE27 for the front pose and PIE29 for the right pose. In each subset, all facial images were taken under different illustration conditions or expression conditions. Figure 6 shows some samples of the selected five subsets. In the experiment, for each pixel of the image, we define a 43-dimensional characteristic vector. The generation of it is as follows: where ( ) x, y is the position coordinate, ( ) I x, y is the gray value at ( ) x, y , and ( ) u ,v G x, y is the response value of the two-

Method
Classification accuracy(%) RSR-S [9] 70.5 RSR-J [9] 74.3 Riem-DLSC-Frob [15] 22.1 KLE-DLSC-Gaussian [17] 64.2 MKSR [20] 64.7 SNP [19] 62.7 RLH-SDL 80.0 dimensional Gabor filter at the ( ) x, y . The u of the Gabor filter u ,v G represents the direction, and its value ranges from 0 to 4. The v represents the scale, and its value ranges from 0 to 7. As a result, there are 40 Gabor filters in total. Therefore, a 64×64 image produces a vector group containing 4096 43dimensional vectors, and the covariance matrix of this vector group is the SPD matrix that characterizes this image.

2) THE EXPERIMENTAL RESULTS
In the experiment, we randomly select 20 images of each person in the selected PIE05 subset as the training dataset for dictionary learning. The remaining images are used as the test dataset of the classifier. PIEALL represents the entire PIE dataset. For the experiment of PIEALL, we randomly choose 70 images of each person as the training dataset for dictionary learning, and the rest are used as the test dataset. The above process is repeated 20 times to obtain reliable statistics. The regular term MR  is set to 0.1, and the sparse term SP  is set to 0.5. Here we use the KNN classifier. We compare the performance of the proposed algorithm with other SPD data DLSR algorithms, including KLE-DLSC, RSR, Riem-DLSC, MKSR, SNP and BLKM-DL. Table 6 shows the result comparisons. From Table 6, we can see that the Riem-DLSC algorithm, which models directly on SPD manifolds, performs very poorly on all subsets of PIE and the entire PIE dataset. There is also a significant gap with other comparison algorithms. Therefore, it can be seen that the method of modeling directly on the SPD manifold is not suitable for the PIE dataset. For the other comparison algorithms, the accuracy of the RSR-S with the Stein kernel function is the highest on the four subsets of PIE05, PIE07, PIE09, PIE27 and PIEALL. While the accuracy of the KLRM-DL algorithm with the Log-Euclidean kernel function and data dependence reaches the highest on the PIE29. The average accuracy of RSR-S is the highest among the comparison algorithms.
The RLH-SDL algorithm proposed in this paper achieves the best performance in each subset and the entire PIE dataset. Therefore, it can show that the RLH-SDL is more suitable for the PIE dataset, which is mapped the original data into the tangent space of the SPD Riemannian manifold. The Graph Laplacian regularization can well retain the similarity between the original data, so that the sparse coding is more discriminative.

VII. CONCLUSION
Non-Euclidean data are more and more appearing in various machine learning applications. The so-called non-Euclidean data are those that can not form a Euclidean space. At present, Reproducing Kernel Hilbert Space (RKHS) is a common mathematical platform for dealing with this problem. However, RKHS is an infinite dimensional Hilbert space which is not isomorphic to Euclidean spaces. Although Non-Euclidean data can not form a Euclidean space, they often can form a Riemannian manifold, while the tangent spaces of the Riemannian manifold are all finite Hilbert spaces and isomorphic to Euclidean spaces. Therefore, this paper proposes to take the tangent spaces of the Riemannian manifold as another mathematical platform for handling non-Euclidean data, on which various machine learning algorithms for non-Euclidean data can be developed. Furthermore, since the tangent space of a point on the Riemannian manifold is local homeomorphic to the neighborhood of the point, a dictionary learning algorithm of non-Euclidean data based on the tangent space of Riemannian space and local homeomorphism is proposed in this paper. The experimental results presented in this paper show the effectiveness of the proposed algorithm.