Domain Adaption Based on Symmetric Matrices Space Bi-Subspace Learning and Source Linear Discriminant Analysis Regularization

At present, Symmetric Positive Definite (SPD) matrix data is the most common non-Euclidean data in machine learning. Because SPD data don’t form a linear space, most machine learning algorithms can not be carried out directly on SPD data. The first purpose of this paper is to propose a new framework of SPD data machine learning, in which SPD data are transformed into the tangent spaces of Riemannian manifold, rather than a Reproducing Kernel Hilbert Space (RKHS) as usual. Domain adaption learning is a kind of machine learning. The second purpose of this paper is to apply the proposed framework to domain adaption learning (DAL), in which the architecture of bi-subspace learning is adopted. Compared with the commonly-used one subspace learning architecture, the proposed architecture provides a broader optimization space to meet the domain adaption criterion. At last, in order to further improve the classification accuracy, Linear Discriminant Analysis (LDA) regularization of source domain data is added. The experimental results on five real-world datasets demonstrate the out-performance of the proposed algorithm over other five related state-of-the-art algorithms.


I. INTRODUCTION
Traditional machine learning methods usually work well on image recognition tasks such as face recognition and object recognition only under a common assumption: the marginal probability distributions of training data and test data are the same or similar [1]. However, in practical applications, changes of some factors in images such as illumination conditions, background, visual angles, posture, and image resolution will mismatch the distributions of training data and test data. DAL algorithms aim to reduce the difference of distributions between the training data and the test data [2].
DAL tasks usually consist of two datasets. One is called source domain data, which contains a large amount of labeled data. The other dataset is called target domain data, and its marginal probability distribution is often different from the The associate editor coordinating the review of this manuscript and approving it for publication was Baozhen Yao . source domain data which means that the representations of features are different. Target domain data are often unlabeled, or only a part of them have labels. Therefore, there are usually two types of DAL tasks, one is called semi-supervised DAL [2]- [4], which means that a part of the target domain data is labeled and we can use the source domain data and this part of the labeled target domain data to learn the intrinsic properties of data during the domain shift. The other DAL task we called unsupervised DAL [5]- [9], which means that all of the target domain data are unlabeled and we can only exploit the target domain data without label information to achieve DA [10]- [12]. This paper only focuses on the unsupervised DAL problem. Because in the real world, unlabeled data can be obtained in large quantities in a variety of ways, but it is difficult to identify their labels. In this sense, the unsupervised DAL algorithms are more applicable for the realistic object recognition application scenarios [7]- [9].
Unsupervised DAL algorithms usually focus on looking for a new feature space of data. In this space, the source domain and the target domain data share similar feature representations, which means that the distributions of two domains in this space are matched. Thus, in this space, the classifier trained by the source domain data can still work well on the target domain. At present, the most commonly used space is the Reproducing Kernel Hilbert Space (RKHS).
It is worth noting that most of the advanced DAL algorithms still concentrate on dealing with Euclidean data. The original data that those algorithms processed are feature vectors. However, nowadays, visual data such as images and videos are increasingly complex and varied. Euclidean feature vectors are no longer able to represent those massive amounts of information precisely and efficiently. Recent researches suggest that a wide range of computer vision problems can be addressed more appropriately by considering non-Euclidean geometry, such as Riemannian manifolds. After some kinds of feature extractions, the original visual data can be transformed onto Riemannian manifolds. The data on the Riemannian manifolds can maintain the Riemannian geometric structure of the original visual data. Many practical visual recognition tasks such as image set classification [13], [14], video face recognition [15], [16], and visual tracking [17] also show that the Riemannian data are more discriminative than the Euclidean data. Therefore, Riemannian manifolds, especially Symmetric Positive Definite (SPD) manifolds, have been applied widely in computer vision tasks. SPD data (usually represented as SPD matrices) embedded on SPD manifold can encode regional image features and more and more researchers propose algorithms to replace the Euclidean data with SPD data as the data descriptors. Riemannian manifolds already have good performance in some machine learning researches such as dictionary learning [18]- [20], kernel learning [21], [22], metric learning [23], discriminative analysis [24], [25], dimensionality reduction of data [26], [27] and so on.
However, in the field of DAL, there are still very few relevant researches on such Riemannian data. The reason is that although Riemannian manifolds such as SPD manifold and Grassmann manifold are metric spaces, they are not linear spaces, which means that linear operations cannot be directly performed on them. Therefore, traditional machine learning algorithms, which require not only metrics but also linear operations, cannot be directly used on Riemannian manifolds [28], [29], and it is also the reason why the current DAL algorithms still deal with the Euclidean data. Whereas, different from those algorithms, the DAL algorithm proposed in this paper directly processes the Riemannian data (SPD data), which means that we extend DAL algorithm from Euclidean space to Riemannian manifold (SPD manifold).
The biggest challenge of SPD data is the challenge of non-Euclidean data structure, in the past many researchers applied traditional Euclidean learning methods directly to SPD data, resulting in the adverse effects of tensor diffusion expansion effect [30], [31], which makes the algorithm performance worse. Affine Invariant Riemannian Metric (AIRM) [30], the logarithmic Euclidean metric [31], the projection metric (PM) [32] are classic metrics. In order to overcome the limitations of European representation, the research on the application of these metric methods to SPD data has also been developed recently, effectively realizing the extension of the traditional Euclidean data learning algorithm to the manifold of learning different SPD data. SPD data are usually processed by machine learning algorithms in two ways to solve the problems of the linear operations, one is to locally flatten them via tangent spaces [23], [33]. And the other is to map them into the RKHS [20], [22] using the kernel trick, which means that the SPD matrix is transformed into a function defined on the whole SPD manifold and this is the most common method. Instead of adopting the RKHS as the workspace to perform DAL on SPD manifold, we explore a new space, called Symmetric Matrices Space (SMS). The data we transformed into SMS are symmetric positive semidefinite matrices rather than functions as those in the RKHS and the transformation is almost an identical transformation, which means that the original form of the data can be preserved in SMS. SMS is an inner product space (a linear space equipped with an inner product) and supports linear operations and distance measurements. Therefore, from the perspective of solving the problems of linear operations on SPD manifold, SMS is the least linear expansion from SPD manifold and seems like a more natural and suitable workspace than the widely-used RKHS. SMS has been used for the machine learning problems on the Grassmann manifold [34], [35], but seems never for SPD manifold. The algorithm we proposed may be the first attempt to exploit SMS as the workspace figuring out the DAL problems on SPD manifold.
The contributions of this paper are summarized as follows: (1) Non-Euclidean data are becoming more and more common in machine learning. Non-Euclidean datasets do not form a linear space and therefore do not support linear operations. However, most machine learning algorithms involve a large number of linear operations. Therefore, these machine learning algorithms can not be carried out directly on non-Euclidean datasets. At present, the commonly-used methods are to transform the non-Euclidean dataset to RKHS, and then carry out machine learning in RKHS. This is the framework of the so-called kernel tricks in machine learning. The transformation from non-Euclidean dataset to RKHS is a nonlinear transformation, in which a non-Euclidean datum is transformed into a function defined on the entire non-Euclidean dataset. Under such transformation, the non-Euclidean datum has undergone significant changes in form and nature. Machine learning on the transformed dataset may not be the original intention. Although a non-Euclidean dataset does not form a linear space, it can often form a Riemannian manifold and the tangent space of Riemannian manifold is a finite-dimensional Hilbert space, that is, Euclidean space. Therefore, this paper proposes a new framework for machine learning of non-Euclidean datasets, in which, with the help of Riemannian manifold, a non-Euclidean dataset can be transformed into a tangent space of its Riemannian manifold, that is, Euclidean space, and then machine learning can be transferred from the non-Euclidean dataset to Euclidean space. This framework is different from the framework of RKHS methods and has universality (Note: RKHS is an infinite-dimensional Hilbert space and therefore not a Euclidean space).
(2) SPD datasets are typical non-Euclidean datasets and they can become Riemannian manifolds under certain topology and Riemannian metrics. In particular, the tangent space of SPD manifold is a symmetric matrix space and the symmetric matrices include SPD matrices. Therefore, the tangent space of SPD manifold is actually the minimum linear expansion of SPD manifold, that is, the minimum linear space containing SPD manifold. The algorithm proposed in this paper uses log transformation to transform SPD matrix into the symmetric matrix, in which the transformation is realized only through the logarithm of eigenvalues in the eigendecomposition formula of SPD matrix. Such transformation keeps the symmetry property of SPD matrix unchanged and the change to SPD matrix as minimal as possible.
(3) In this paper, the proposed framework of non-Euclidean data machine learning based on Riemannian manifold tangent space is applied to domain adaption learning, in which the SPD source and target domain data are first transformed into the tangent space of SPD manifold, and then two subspaces in the tangent space are learned so as to make the probability distributions of source and target domain data as close as possible when they are projected into these two subspaces. The advantage of the bi-subspace setting is to expand the optimization space and avoid possible local extremum traps. More importantly, different regularization terms can be added according to different characteristics of source and target domain data. In this paper, the LDA regularization term of source domain subspace is added.
The rest of this paper is organized as follows. In Chapter II, we briefly introduce the related concepts and mathematical properties. In Chapter III, some related works are reviewed. In Chapter IV, we propose SMSBL-DA. In Chapter V, we compare five advanced DAL algorithms in theory. In Chapter VI, we describe the five classification experiments on five real-world datasets. Finally, we conclude our work in Chapter VII.

A. RIEMANNIAN MANIFOLD
The Riemannian metric can be defined on the differential manifold so that the differential manifold becomes a Riemannian manifold. The so-called Riemannian metric is a symmetric, positive definite, and smooth second-order tensor field on the differential manifold. If we define an inner product for the tangent space of each point on the differential manifold, that is, asymmetric, positive definite second-order tensor field (inner product field) is defined on the differential manifold. Therefore, the difference between the inner product field and the Riemannian metric lies in whether the inner product field is smooth on the differential manifold.
The differential manifold that defines the Riemann metric is called the Riemannian manifold. Using Riemannian metric, we can define the length of any curve on the Riemannian manifold, and then you can define the distance between any two points on the Riemannian manifold, which is called the geodesic distance. The geodetic distance satisfies the distance of three kilometers, therefore, the Riemannian manifold is a metric (distance) space.
Since the tangent space of each point on the differential manifold is a finite-dimensional linear space, once the inner product is defined, it becomes a finite-dimensional inner product space, and the finite-dimensional inner product space (Hilbert space) is complete, so, the tangent space of each point on the Riemannian manifold is a finite-dimensional Hilbert space. The finite-dimensional Hilbert space is isomorphic with the same-dimensional European space. Therefore, the tangent space at each point of Riemannian manifold is essentially European space.

B. PROJECTION THEOREM IN HILBERT SPACES
{e 1 , · · · , e d } ⊆ H is the standard orthogonal basis of G, x ∈ H , then the projection z of x in the subspace G is: and y =    x, e 1 . . .
x, e d    ∈ R d is the coordinates of the standard orthogonal basis of projection z in the subspace G.

D. FROBENIUS INNER PRODUCT AND NORM OF MATRICES
Define K = {A|A ∈ R n×m }, obviously, K is a linear space in the real number domain. Further, define the inner product: for any A, B ∈ K , Here tr(•) represents the trace of matrix. Obviously, •, • F satisfies the condition of inner product, namely symmetric, positive definite and bilinear. •, • F is called the Frobenius inner product of matrices, and norm defined by Frobenius inner product is called Frobenius norm of matrices, denoted as • F = √ •, • F . Remark 1: A, B F essentially means that the corresponding elements of the two matrices are multiplied and then added. Therefore, if two matrices A and B are expressed as vectors, then A, B F = A T B.

III. RELATED WORKS A. DOMAIN ADAPTATION BASED ON SUBSPACE
Existing DAL algorithms for visual recognition tasks can be roughly divided into two types, where the first type is based on subspace [36]- [41], and the second type is based on sample reweighting [42], [43]. DAL algorithms based on sample reweighting aim to compare the distributions of two domains in the original feature space. But the feature space may not be appropriate for the specific task targeted, because the features extracted from images may be distorted during the domain shift. And some features may only be related to a specific domain and not applicable to other domains [44]. Therefore, the DAL algorithms based on subspace are more widely studied and achieved greater success. Our algorithm is also a kind of subspace-based algorithm. Thus, we will introduce some subspace-based DAL algorithms in the following paragraphs.
In 2008, Pan et al. [45] first applied Maximum Mean Difference (MMD) criterion to DAL, proposing Maximum Mean Discrepancy Embedding (MMDE) dimensionality reduction domain adaption algorithm. MMD is a common criterion for measuring the probability distribution differences between the source and target domain data, and the MMDE algorithm uses kernel functions to map data to the RKHS and then uses MMD criterion in the RKHS to build a learning model to learn a kernel space that can reduce the difference in data distribution between the source and target domains while maintaining the variance of the data. MMDE learns kernel functions from data, resulting in the disadvantage that their computational costs are too high. Thus, based on MMDE, Pan et al. [46] proposed a new learning method for domain adaptation in 2011, i.e., Transfer Component Analysis (TCA). In order to solve the problem that TCA is only applicable to unsupervised learning, Pan et al. [46] proposed the semi-supervised learning method, i.e., Semi-Supervised Transfer Component Analysis (SSTCA) in 2011. More specifically, in terms of optimizing TCA, they choose to minimize probability distribution differences between source and target domain data in embedded space and introduce Hilbert-Schmidt Independence Criterion (HSIC) to more accurately estimate dependencies between labels and data. They also introduced a popular regularization term to maintain local data geometry. The biggest difference between MMDE and TCA(SSTCA) is that MMDE learns a kernel function, while TCA(SSTCA) predefines a kernel function. It is worth highlighting that if the nucleus function is optimized in TCA(SSTCA), there is no essential difference between MMDE and TCA. In 2017, Jiang et al. [47] proposed Integration of Global and Local Metrics for Domain Adaptation Learning (IGLDA), which used kernel method to solve DAL problems by optimizing bi-object. In this algorithm, they proposed a new mapping function that makes the distribution difference between source domain data and target domain data as small as possible (MMD of source domain data and target domain data as small as possible), while retaining the local geometry of the labeled source domain data. At the same time, the local geometric structure of the source domain data with labels should be preserved, and it is expected that the problems in the target domain can be solved with the knowledge obtained in the source domain. They introduced SSTCA-based category information to minimize in-class distance. In this sense, IGLDA can be seen as an extension of SSTCA, but there are essential differences between the two algorithms. SSTCA needs to make rich use of the dependencies between labels and samples, so it takes advantage of the covariance between labels and mapped data. However, IGLDA does not need to formulate this constraint, and the dependency relationship between labels and data can be improved well by bringing similar samples closer together and better distinguishing samples in public space. Furthermore, in 2019, Li et al. [48] proposed a common framework for unsupervised heterogeneous domain adaptation, called Transfer Independently Together (TIT). They theoretically studied two strategies, namely distribution matching and knowledge adaptation. The distribution matching section is used to better handle heterogeneous data, and the knowledge adaptation section is used to address the issue of transferring as much knowledge as possible in an unsupervised domain adaption learning approach. In 2020, Zhang et al. [49] proposed an unsupervised domain adaptive algorithm based on subspace learning, called Guide Subspace Learning (GSL). The main idea of GSL is to use two different projection matrices to project the source domain data and the target domain data into their respective subspaces.

B. MACHINE LEARNING BASED ON MANIFOLD
Manifold-based machine learning methods [50], [51] have been successfully studied in recent years, and the basic idea is to apply principal component analysis (PCA) to domain adaption to obtain the geodesic line distance between the two points on the Glassman manifold. Then, randomly collecting samples along the geodesic line and constructing constraints on subspaces. Finally, mapping the data in the source domain to the subspace and learning the sample in the subspace. This method is the basis of many studies, based on which researchers proposed several effective manifold-based methods for domain adaptation. Fernando et al. [52] in 2013 proposed a method called subspace alignment, where source and VOLUME 9, 2021 target domain data are represented in subspaces created by feature vectors, which is similar to PCA. Ho and Gopalan [53] deduced a potential domain (or subspace) in 2014 by using points to represent objects. More specifically, based on the study of the tensor space, they characterize the potential domain with a product of Glassman manifold and successfully apply it to video and image classification tasks. In 2014, Cui et al. [54] presented a method that differs from the literature in which the representation of the domain is expressed directly in a covariance matrix, looking at the geodesic distance between the source domain data and the target domain data on the Riemannian manifold. The learning sample is then mapped to the middle area of the geodesic line so that the domain invariant features can be extracted.
Various machine learning methods based on Riemannian manifold have achieved great success in recent years, and the success of classification applications based on Riemannian manifolds is largely due to the introduction of metric learning. To extend metric learning study of SPD matrices, Zhou et al. [33] proposed a metric of the distance to learn d × d SPD matrices. After optimizing the processing, for the characteristic values of SPD matrices, the learning parameters were reduced to d. Based on the theory that SPD manifold gives a Riemannian metric and it becomes a Riemannian manifold, Xie et al. [55] proposed to produce an intrinsic sub-manifold using bilinear mapping, and then to learn this sub-manifold with the bilinear sub-manifold learning (BSML) algorithm. In order to solve the problem of measuring the distance between two heterogeneous spaces, Huang et al. [23] proposed the CERML framework to learn distance metric, and they performed metric learning on traditional Riemannian geometry through kernel embedding, Riemannian manifold, and source European space are mapped together into the same European subspace, thus transforming the original problem into the European distance learning in the same European space. In 2021, Tang et al. [56] proposed to add the appropriate Riemannian metric to the General Learning Vector Method (GLVQ), where the characteristics entered changed from a European vector representation to an SPD matrix representation.

IV. DOMAIN ADAPTION BASED ON SYMMETRIC MATRICES SPACE BI-SUBSPACE LEARNING AND SOURCE LINEAR DISCRIMINANT ANALYSIS REGULARIZATION A. THE ISOMETRY TRANSFORM BETWEEN SPD MANIFOLD AND ITS TANGENT SPACE
Differential manifold is just a topological space, there is no distance metric between elements, let alone linear operation. Although the tangent space of differential manifold supports linear operation, there is no distance metric between tangent vectors. In order to support machine learning, a Riemannian metric must be assigned to the differential manifold, making it a Riemannian manifold. SPD manifold is one of the most commonly used Riemannian manifolds in machine learning. The space composed of D × D SPD matrices is called SPD manifold, denoted as Sym D ++ , and whole D × D symmetric matrix set is expressed as Sym D . It can be proved that for any X ∈ Sym D ++ , the tangent space of the SPD manifold and SMS are linearly isomorphic [29], that is T X 0 Sym D ++ = Sym D . SPD manifold, from the perspective of Riemannian manifold, the tangent space of each point is the inner product space, and the finite-dimensional inner product space is complete, that is, the tangent space of each point on the Riemannian manifold is a finite-dimensional Hilbert space. The finitedimensional Hilbert spaces are all isomorphic with the samedimensional European spaces. Therefore, the tangent space of each point on the Riemannian manifold is essentially a European space.
To transfer machine learning from SPD manifold to its tangent space, we must first transform the learning samples of SPD data to the tangent space of SPD manifold. We use transformation log : · · · λ D > 0 is the eigenvalue of X , and U is an orthogonal matrix composed of the normal orthogonal eigenvectors of X , so it is defined: Some documents also use log transformation, but log transformation has the following two characteristics that seem to be unmentioned: (1) Under the AIRM [30], log transformation is an equidistant transformation, that is, the geodetic distance between X and the identity matrix is equal to the Euclidean distance between log(X ) and the origin of the tangent space. The AIRM of SPD manifold is as follows: Let X ∈ Sym D ++ , T X Sym D ++ be the tangent space of X , then for any , ∈ T X Sym D ++ , , If X = I D (identity matrix), then the inner product and the Euclidean distance in T I D Sym D ++ are as follows: The geodesic distance induced by the AIRM is: So we have: log(X ) I D is the Euclidean distance between log(X ) and the origin in T I D Sym D ++ . (2) Since SPD data does not constitute a linear space, the usual approach is to transform SPD data into a linear space. Currently, the most commonly used linear space is the RKHS. However, in this transformation, an SPD matrix is transformed into a function defined on SPD manifold, and the matrix form and properties of the SPD matrix are transformed. Using such data for machine learning may no longer be the original intention. Looking back at log transformation, log transformation just performs log transformation on the eigenvalues in the SPD matrix feature decomposition, and log function is a monotonically increasing function which maintains the order of the eigenvalues in the SPD matrix feature decomposition. Therefore, log transformation is the transformation that changes the SPD matrix the least in terms of form and nature.

B. THE FRAMEWORK OF BI-SUBSPACE LEARNING OF SPD MANIFOLD UNIT MATRIX TANGENT SPACE
Non-European data (not data in linear space) is becoming more and more common in machine learning, but now machine learning algorithms involve a large number of linear operations, so non-European data cannot be processed directly. The current common approach is to map non-European data to Hilbert space, which means transfer DAL to Hilbert space to learn. We can transfer machine learning from SPD manifold to SMS under certain conditions. This paper studies the domain adaptation of SPD data on SMS, so the given learning sample are SPD data samples. From the perspective of SMS, these SPD data lie in one corner, and we have expanded the domain adaptation of SPD data to the domain adaptation of symmetric matrix data. therefore, we need to spread the given SPD data learning sample to the entire SMS. we use log transformation mentioned in the previous section to transform the data into Riemannian manifold tangent space, not to the RKHS, Riemannian manifold tangent space is finite-dimensional Hilbert space, which is also Euclidean space, while the RKHS is infinite-dimensional Hilbert space, not Euclidean space. At present, most machine learning algorithms are developed in European space and can be directly applied to Riemannian manifold tangent space. Here, we first give the framework of bi-subspace learning of SPD manifold unit matrix tangent space.

1) THE CONSTRUCTION AND CONSTRAINT OF THE BI-SUBSPACE OF SPD MANIFOLD UNIT MATRIX TANGENT SPACE
Given the two sets of learning samples X 1 , · · · , X N s ⊆ Sym D ++ , X 1 , · · · , X N t ⊆ Sym D ++ on SPD manifold, after log transformation, they are transformed into the tangent space T I D Sym D ++ of the identity matrix I D : We take the linear combination of log (X 1 ) , . . . , log X N s to construct the orthonormal basis {θ s1 , · · · , θ sd } of a subspace in the bi-subspace as an example. The construction of the orthonormal basis {θ t1 , · · · , θ td } of the other subspace is the same: That is to say: Because here Obviously, after the learning sample X 1 , · · · , X N s , X 1 , · · · , X N t is given, the bi-subspace is determined by the coefficient matrices W s and W t . Therefore, in this paper, we use W s and W t to represent the bi-subspace. Both the bi-subspaces W s and W t are d-dimensional subspaces which must satisfy the above constraints.
Remark 2: Suppose log(X s ) = log(X 1 ) · · · log(X N s ) , log(X s ) = log(X 1 ) · · · log(X N s ) , here log(X i ) and log(X j ) are both expressed in the form of a column vector, i = 1, . . . , N s , j = 1, . . . , N t , then This means that the matrices P s and P t are at least a symmetric positive semi-definite matrix. For most learning samples, the matrices P s and P t are a symmetric positive definite matrix. A brief schematic diagram of the construction of the bi-subspace of SPD manifold unit matrix tangent space is shown in Fig.1.

FIGURE 1.
Schematic diagram of the construction of the bi-subspace of SPD manifold unit matrix tangent space. Firstly, original SPD data are mapped to SPD manifold unit matrix tangent space. Then, we project the source domain data and target domain data into their respective SPD manifold unit matrix tangent space subspaces W s and W t according to the projection theorem.

2) THE EXPRESSION OF SPD DATA IN THE BI-SUBSPACE OF SPD MANIFOLD UNIT MATRIX TANGENT SPACE
For all X s ∈ Sym D ++ , X t ∈ Sym D ++ , log (X s ) ∈ T I D Sym D ++ , log (X t ) ∈ T I D Sym D ++ , then project log (X s ) and log (X t ) into bi-subspace W s and W t of T I D Sym D ++ respectively. According to the projection theorem, if {θ s1 , · · · , θ sd }, {θ t1 , · · · , θ td } are the orthonormal basis of bi-subspace W s and W t respectively, then the coordinates of the projections of log (X s ) and log (X t ) on bi-subspace W s and W t are: In particular, for a given learning sample X 1 , . . . , X N s and X 1 , . . . , X N t , their corresponding expressions in the bi-subspace W s and W t are respectively: where P s(iCol) is the i th column vector of P s , i = 1, . . . , d; P t(jCol) is the j th column vector of P t , j = 1, . . . , d.
In this way, we finally realize the embedding from the original SPD manifold to the target Hilbert space (the inner product space of finite dimension is a Hilbert space) bi-subspace, this target space is isomorphic with the Euclidean space, we can perform linear operations on sample points, Euclidean metrics between sample and sample, etc., in this embedding space. Although this domain adaptation framework is clear, how to determine the coefficient matrix is a problem worthy of discussion, because according to different constraint criterion, we can get different domain adaptation algorithms, and of course, different criterion consider different issues. For example, it can be supervised or unsupervised learning. Machine learning based on subspace learning is widely used. This paper applies SPD manifold tangent space bi-subspace learning to domain adaptation. Assume where X s is the labeled source domain dataset, and X t is the unlabeled target domain dataset. The source domain data is labeled data, and the target domain data is unlabeled data. The source domain dataset and the target domain dataset have different distributions. We hope to use the label of the source domain data to identify the label of the target domain data.
Domain adaptation based on bi-subspace learning is to transform the source domain dataset X s and the target domain dataset X t into the bi-subspace respectively, so that their distribution in the bi-subspace is as the same as possible, and then discriminate the label of the target domain data in the bi-subspace.
We transform the datasets X s and X t into the bi-subspaces W s and W t of SPD manifold unit matrix tangent space T I D Sym D ++ respectively. According to the framework deduced in part B of section IV, the data form of the source domain dataset X s and the target domain dataset X t transformed into the bi-subspace and are: In part B of section IV, we mentioned that the constraint criterion of DAL are open, and different algorithms can be obtained by using different criterion. In this paper, MMD is used as the criterion for the selection of domain adaption bi-subspace W s and W t : where The constraint W T PW = I d is to avoid trivial solutions. The physical meaning of this model is very clear, that is, by mapping SPD data to SMS so that it can perform linear operations and inner product operations, and then project the data from SMS to SMS subspace to achieve the domain adaptation effect. This is an unsupervised method, which does not require label information and saves the cost of manual labeling.

D. LINEAR DISCRIMINANT ANALYSIS REGULARIZATION OF SOURCE DOMAIN DATA SUBSPACE LEARNING
The algorithm we proposed uses bi-subspace learning for DAL, which respectively correspond to the source domain data subspace learning and the target domain data subspace learning. Since the source domain data X s has labels, the target domain data X t has no labels, and the purpose of unsupervised DAL is to use the labeled source domain data to identify the unlabeled target domain data, so here we choose to add LDA regularization term to the source domain data subspace learning, which also reflects the benefits of bi-subspace learning we propose. According to the previous section, the data form of the source domain data X s projected into the subspace W s is: y s 1 , · · · , y s N s = y s 11 , · · · , y s 1N 1 , · · · , y s C1 , · · · , y s CN C VOLUME 9, 2021 where N S = C c=1 N c , the mean value of the c th category (y s c1 , · · · , y s cN c ) is: The mean of all data is: Let the inter-class divergence be: Let the within-class divergence be: Therefore, in our algorithm, the model for LDA regularization term to source domain data subspace learning is as follows: where D is a diagonal matrix,if x i ∈ X s , D ii = 1, otherwise D ii = 0.

E. SOLUTION TO SMSBL-DA
In this way, the overall objective function of SMSBL-DA is as follows: Let L = PMP T + λD ( − ), then the above objective function can be rewritten as follows: Because the matrix P is a symmetric positive definite matrix, it can be expressed as where UU T = I , where N = − 1 2 U T LU − 1 2 . Obviously, N is at least a symmetric positive semi-definite matrix. Here we use the properties of generalized Rayleigh entropy to solve the above problem. V is the d-dimensional column vector, which is composed of the orthonormal eigenvectors corresponding to the first d smallest eigenvalues of the matrix N . After the matrix V is obtained, the matrix W is: The solution process of SMSBL-DA is displayed in Algorithm 1.

Algorithm 1 SMSBL-DA
Input: source domain data sample X s , target domain data sample X t , the label information of X s , parameter λ. Output: the label information of X t .
1. Map source domain data X s and target domain data X t to SMS. 2. Calculate matrices P, M , , , D, N .
3. Perform eigenvalue decomposition on N , and take the orthonormal eigenvectors corresponding to the first d smallest eigenvalues of N as the column vector of V to obtain matrix V , and get the subspace coefficient matrix W from W = U − 1 2 V . 4. Project X s and X t to their respective subspaces to obtain data,that is y s i = W T s P s(iCol) and y t j = W T t P t(jCol) . Classify y t j in the subspace by 1-KNN, and y s i is used as reference.

F. COMPLEXITY ANALYSIS
The time complexity of the SMSBL-DA is mainly composed of four main parts: mapping the source domain data X s and target domain data X t into SMS, calculating the complexity of matrices P, M and D, calculating the complexity of matrices and , and optimization of feature problems. Here we discuss the time complexity of these main steps. The time complexity of the mapping from the original SPD manifold to SMS is O(Nn 3 log(n)); the time complexity of calculating inner product matrix P is O (N 2 n 3

V. COMPARISON TO OTHER RELATED ADVANCED ALGORITHMS A. COMPARE WITH TCA
The essence of TCA [46] is to learn the subspace that minimizes the distribution difference between the source domain data and the target domain data and maintains the data characteristics. TCA first maps the original data to the RKHS through the kernel function, and then maps the feature data of the infinite-dimensional kernel space to an m-dimensional low-dimensional RKHS subspace by determining a subspace projection matrix W ∈ R (N s +N t )×m . The model of TCA is: The constraints W T KHKW = I d can maintain the variance of the data and the linear independence of the projection matrix, tr(W T W ) can control the complexity of the projection matrix W ∈ R (N s +N t )×m . Both SMSBL-DA and TCA use MMD, but SMSBL-DA proposes to add LDA regularization term for the source domain data subspace learning, which also reflects the benefits of the bi-subspace learning in SMSBL-DA. In addition, TCA is based on the RKHS subspace, while SMSBL-DA is based on SMS bi-subspace.

B. COMPARE WITH SSTCA
SSTCA [46] is a semi-supervised learning method proposed based on TCA. SSTCA chooses to minimize the probability distribution difference between the source domain data and the target domain data in the embedding space. It introduces HSIC to more accurately estimate the dependency between the label and the data and maintains the local data geometric structure by introducing a popular regularization term. The model of SSTCA is:

s.t. W T PHp yy HPW = I
SSTCA, like TCA, is based on the RKHS subspace learning, and both adopt the MMD. The main difference between SMSBL-DA and SSTCA is: SSTCA is to perform subspace learning on the RKHS, while SMSBL-DA uses SMS as a working space to perform DAL on SPD manifolds. By mapping SPD data to SMS, it can perform linear operations and inner product operations.

C. COMPARE WITH IGLDA
IGLDA [47] proposed a new mapping function to make the distribution difference between the source domain and target domain data as small as possible, and through the kernel function, the source domain data and target domain data are mapped to a latent space. IGL refers to the idea of integrating global and local metrics. IGLDA aimed to retain the local geometric structure of the labeled source domain data, and its purpose is to use the knowledge acquired in the source domain to solve problems in the target domain. The model of IGLDA is: where c is the number of categories, and N l is the number of data in each category. The main difference between our algorithm and IGLDA is that we transfer the data from the SPD manifold to the SMS through log transformation. log transformation has the smallest change to the form of the data, it only performs a logarithmic transformation on the eigenvalues of the matrix, and hardly changes the form of the matrix. However, IGLDA performs DAL in the RKHS and transforms the data into functions defined in the original space, which leads to relatively big changes in the form and nature of the data.

D. COMPARE WITH TIT
TIT [48] theoretically studied two strategies, namely distribution matching and knowledge adaptation. TIT minimizes MMD of the infinite-dimensional RKHS, but it is worth noting that the distribution matching part still uses the original ecological version of MMD. The disadvantage of this method is that the probability distribution of the data is not known, and the mean of the maximum mean difference is used instead of the approximate distribution, so MMD obtained is replaced by the mean. In the knowledge adaptation part, manifold regularization term and row sparse regularization term are introduced. The model of TIT is: The differences between SMSBL-DA and TIT are as follows: first, TIT is based on the RKHS subspace learning, while SMSBL-DA maps the data from the original SPD manifold to SMS through log transformation. Secondly, TIT transfers data to a subspace, and our algorithm projects the data into their respective subspaces. Finally, TIT uses a popular regularization term similar to PCA, while SMSBL-DA uses source domain data subspace learning LDA regularization term.

E. COMPARE WITH GSL
GSL [49] is also an unsupervised domain adaption algorithm, Guided Learning refers to a learning method motivated by feedback information. GSL does not use MMD criterion, but simply minimizes the Bregman divergence between two subspaces W s and W t as the basic model. GSL obtains more inherent information of the data by reconstructing the inter-domain data to guide the learning of the target domain subspace W t . The model of GSL is: is a classifier. The differences between SMSBL-DA and GSL lie in: first, GSL finds two subspaces in the original space, while SMSBL-DA finds two subspaces in SMS. Second, in SMSBL-DA, SMS is the smallest linear extension of SPD manifold, and SPD manifold is a submanifold of SMS. Given the original SPD data, from the perspective of SMS, these SPD data are only in one corner, and we diffuse SPD manifold to the entire space, that is, expand to SMS.

VI. EXPERIMENTS
In this section, in order to verify the validation and effectiveness of the DAL algorithm SMSBL-DA we proposed in this paper, we perform SMSBL-DA to object recognition tasks and face recognition tasks on 5 real-world datasets. For object recognition tasks, we use COIL [57], and MSRC+VOC2007 [61]. For face recognition task, we use AR dataset [62], ORL dataset [63], and YALE dataset [60]. And SMSBL-DA is compared with five domain adaptation algorithms mentioned in this paper, including TCA [46], SSTCA [46], IGLDA [47], TIT [48], and GSL [49].

A. DATA PROCESSING AND CLASSIFICATION STRATEGY
Data processing: to convert the original image to the SPD matrices, the area covariance matrices need to be calculated. For a given rectangular area R ⊂ F, let {f i } , i = 1, · · · , n be the d-dimensional feature vector of n pixels in the rectangular area. Then for the area R, the d × d-dimensional covariance matrix of the area can be calculated: where µ is the mean of {f i } , i = 1, · · · , n. Classification strategy: for each classification task, the classification training set is all source domain data, and the test set is all target domain data. In each recognition task, our algorithm and comparison algorithms adopt SPD feature matrices or feature vectors as the data of two domains. But the final feature representations obtained by all these algorithms are all vectors, such as y s i , i = 1, · · · , N s and y t j , j = 1, · · · , N t for our algorithm, and so on. For the classification results, i.e., the labels, of the 16 original SPD data (or feature vectors) generated by a single image of the target domain, the one label which is accounted for the largest proportion among the 16 results will be taken as the final classification result of the image.

B. COIL DATASET OBJECT RECOGNITION TASK EXPERIMENTAL RESULTS AND ANALYSIS
COIL dataset [57] contains 1440 grayscale images of 20 objects with different angles. The camera takes a picture of each object every 5 degrees, and each object has 72 pictures, and the size of image is uniformly 128×128. We divide COIL dataset into 4 subsets named COIL1, COIL2, COIL3, and COIL4, which contains all images taken in the directions of [0,85], [90,175], [180,265], and [270,355] respectively. Fig. 2 shows 20 categories of objects in COIL dataset, and Fig. 3 shows the examples in 4 domains of one object from COIL dataset. Each domain can be regarded as the source domain or the target domain, so we have 12 different tasks as shown in the first column of Table 1.
In this experiment, based on the regional covariance descriptor [58], we adopted a ''rich feature extraction'' strategy to extract SPD manifold feature matrices data from images. First, images are divided into regions. For COIL dataset, images are still the original 128 × 128, the region window size is set to 47 × 47, the horizontal and vertical window sliding steps are both 27, and an image will produce 16 overlapping regions. Then, for each pixel (x, y) of the image, we define a 17-dimensional feature vector: y, I (x, y), I x , I y , I xx , I yy , where I (x, y) is the gray value of the pixel (x, y), I x and I y are the first step of the image in the horizontal and vertical directions respectively, I xx and I yy are the second step of the image in the horizontal and vertical directions respectively, and G i (x, y), i = 1, · · · , 8 are influence value of a set of 8 DOOG filters on I (x, y) [59]. Finally, we calculate the regional covariance matrix for each regional block, obtain a 16-dimensional SPD matrix from a picture. Since the data processed by the five comparison algorithms TCA, SSTCA, IGLDA, TIT, and GSL are all European eigenvector data, in order to compare fairness, we adopt the method in [60] to vectorize the upper (or lower) triangular matrix including diagonal elements in the regional covariance matrix, so the dimension of the feature vectors using by the comparison algorithms on COIL dataset is 17 × (17 + 1) ÷ 2 = 153. The experimental settings and parameter selection of our algorithm and the comparison algorithms are as follows: we choose the linear kernel uniformly for the kernel function of the comparison algorithms, the two subspace dimensions of our algorithm and the subspace dimension of the comparison algorithms are both set to 1000 dimensions, the classifier adopts 1-NN classifier, and repeat each task 10 times. SMSBL-DA: λ = 1; TCA: µ = 1; SSTCA: µ = 1, λ = 5 × 10 5 ; IGLDA: µ = 1, λ = 100; TIT: µ = 10 −14 , λ = 10 −10 , β = 1; GSL: λ = 0.1, β = 4 × 10 3 .
The experimental results are shown in Table 1, the bolded data in the table mark the best results in each task, and the 12 tasks of COIL dataset are numbered 1-12 in Fig. 4 following the order in the first column of Table 1. According to the results of the table, in the 12 tasks of COIL dataset, the performance of the six algorithms is quite excellent, but SMSBL-DA is still better than the other five comparison algorithms, and the average classification accuracy rate reaches the highest 94.56%. Compared with TCA, SSTCA, IGLDA, TIT, and GSL, SMSBL-DA has 7.80%, 16.87%, 9.00%, 5.24%, and 5.88% improvement in average classification accuracy respectively. Among them, SMSBL-DA has the most significant improvement in classification accuracy for SSTCA, reaching 17.42%, and it is still about 3.17% higher than the best-performing TIT in the comparison algorithms. This shows that SMSBL-DA has good robustness and stability.    Fig. 4 following the order in the first column of Table 1.
Cambridge contains 4323 color images of 18 classes of objects. VOC2007 (V) contains 5011 color images of 20 classes of objects. MSRC is from standard images for benchmark evaluation while VOC2007 is from arbitrary photos in Flickr. Fig. 5 shows some examples under six categories of the two datasets, thus we can treat them as two different domains. In our experiment, we selected six common categories of objects shared by the two datasets: airplanes, bicycles, birds, cars, cows, and sheep. In each domain, we randomly selected 50 images for each category, so we will get 300 images for each domain. And we only have two tasks in this experiment, namely M−→V and V−→M.
In this experiment, based on the regional covariance descriptor [58], we adopted a ''rich feature extraction'' strategy to extract SPD manifold feature matrices data from images. Firstly, images are divided into regions, for MSRC+VOC2007 dataset, we uniformly resized each image to a 300 × 300 image, and the region window size is set to 120 × 120, the horizontal and vertical window sliding steps are both 60, so an image will produce 16 non-overlapping area blocks. Then, for each pixel (x, y) of the image, we define a 17-dimensional feature vector: y, I (x, y), I x , I y , I xx , I yy , where I (x, y) is the gray value of the pixel (x, y), I x and I y are the first step of the image in the horizontal and vertical directions respectively, I xx and I yy are the second step of the image in the horizontal and vertical directions respectively, and G i (x, y), i = 1, · · · , 8 are influence value of a set of 8 DOOG filters on I (x, y) [59]. Finally, we calculate the regional covariance matrix for each regional block, obtain a 16-dimensional SPD matrix from a picture. Since the data processed by the five comparison algorithms TCA, SSTCA, IGLDA, TIT, and GSL are all European eigenvector data, in order to compare fairness, we adopt the method in [60] to vectorize the upper (or lower) triangular matrix including diagonal elements in the regional covariance matrix, so the dimension of the feature vectors using by the comparison algorithms on MSRC+VOC2007 dataset is 17 × (17 + 1) ÷ 2 = 153. The experimental settings and parameter selection of our algorithm and the comparison algorithms are as follows: we choose the linear kernel uniformly for the kernel function of the comparison algorithms, the two subspace dimensions of our algorithm and the subspace dimension of the comparison algorithms are both set to 1000 dimensions, the classifier adopts 1-NN classifier, and repeat each task 10 times. SMSBL-DA: λ = 1; TCA: µ = 10 −10 ; SSTCA: µ = 10 −10 , λ = 10 −10 ; IGLDA: µ = 1, λ = 100; TIT: µ = 10 −14 , λ = 10 −10 , β = 1; GSL: λ = 1, β = 10 3 .
The experimental results are shown in Table 2, the bolded data in the table mark the best results in each task and the 2 tasks of MSRC+VOC2007 dataset are numbered 1-2 in Fig. 6 following the order in the first column of Table 2. Among the two tasks in MSRC+VOC2007 dataset, SMSBL-DA has the best performance on one task, and TIT has the best performance on one task. However, the average classification accuracy of SMSBL-DA is still the highest, reaching 59.20%, which is 0.30% higher than the best TIT in the comparison algorithms. Compared with TCA, SSTCA, IGLDA, TIT and GSL, SMSBL-DA has 5.55%, 7.00%, 13.80%, 0.30% and 8.40% improvement in average classification accuracy respectively. It can be seen that the average classification accuracy of SMSBL-DA compared to IGLDA has the largest increase, reaching 13.80%. Therefore, SMSBL-DA proposed in this paper has certain stability and robustness under MSRC+VOC2007 dataset.

D. AR DATASET FACE RECOGNITION TASK EXPERIMENTAL RESULTS AND ANALYSIS
AR dataset [62] is a face recognition dataset widely used in various algorithms which contains more than 4000 face FIGURE 6. The 2 tasks of MSRC+VOC2007 dataset are numbered 1-2 in Fig. 6 following the order in the first column of Table 2. photos of 126 people. This experiment selected a most common data subset, this subset contains the grayscale images of 100 people's faces, 26 pictures per person, a total of 2600 pictures of 165 × 120 size. The 26 pictures of each person differ in expression, lighting, obstructions (sunglasses or scarf), and shooting date (taken in two weeks), so it meets our domain adaption task requirements. Fig.7 shows 26 pictures of one person from AR dataset, 26 pictures are divided into 2 rows and 13 columns, the 13 pictures in the first row are the first group, and the 13 pictures in the second row are the other group, which were taken two weeks later. In our experiment, we merge the data of columns 1-4, columns 5-7, columns 8-10, and columns 11-13 according to the naked face, the naked face that changes according to the light, the face with sunglasses, and the face with a scarf numbered as (1), (2), (3), (4), and then use the data domains (1) and (2) as the source domains, and the other three domains as the target domains, so this experiment has set a total of 6 different tasks. In this experiment, based on the regional covariance descriptor [58], we adopted a ''rich feature extraction'' strategy to extract SPD manifold feature matrices data from images. Firstly, images are divided into regions, for AR dataset, we uniformly resized each image to a 120 × 120 image, and the region window size is set to 30 × 30, the horizontal and vertical window sliding steps are both 30, so a picture will produce 16 non-overlapping area blocks. Then, for each pixel (x, y) of the image, we define a 17-dimensional feature vector: f AR x,y = [x, y, I (x, y), I x , I y , I xx , I yy , , y), · · · , G 8 (x, y)] (54) where I (x, y) is the gray value of the pixel (x, y), I x and I y are the first step of the image in the horizontal and vertical directions respectively, I xx and I yy are the second step of the image in the horizontal and vertical directions respectively, and G i (x, y), i = 1, · · · , 8 are influence value of a set of 8 DOOG filters on I (x, y) [59]. Finally, we calculate the regional covariance matrix for each regional block, obtain a 16-dimensional SPD matrix from a picture. Since the data processed by the five comparison algorithms TCA, SSTCA, IGLDA, TIT, and GSL are all European eigenvector data, in order to compare fairness, we adopt the method in [60] to vectorize the upper (or lower) triangular matrix including diagonal elements in the regional covariance matrix, so the dimension of the feature vectors using by the comparison algorithms on AR dataset is 17 × The experimental results are shown in Table 3, the bolded data in the table mark the best results in each task, and the 6 tasks of AR dataset are numbered 1-6 in Fig. 8 following the order in the first column of Table 3. According to the results of the table, among 6 tasks in AR dataset, SMSBL-DA is better than the other 5 comparison algorithms in 6 tasks, and the average classification accuracy of SMSBL-DA is the highest among all algorithms, reaching 89.58 %. Compared with TCA, SSTCA, IGLDA, TIT and GSL, SMSBL-DA has improved average classification accuracy of 21.44%, 23.70%, 25.9%, 16.67% and 25.58% respectively, which is still about 17.02% higher than the best performing TIT. It can be seen that in terms of average classification accuracy, SMSBL-DA has the largest improvement over IGLDA, reaching 25.93%, which is a significant improvement. It shows that our algorithm has a certain degree of stability and robustness.

E. ORL DATASET FACE RECOGNITION TASK EXPERIMENTAL RESULTS AND ANALYSIS
ORL dataset [63] is also often used in face classification experiments in the field of face recognition, the face images of forty volunteers make up the dataset. The 40 volunteers have different races, genders, and ages, each volunteer collects   Table 3.
images under different conditions such as different light, different expressions, and different shooting angles. Each person has collected 10 images, the size of each image is 99 × 112. Fig. 9 shows all the samples of 40 people in ORL dataset. We use the letters (a)-(j) to represent the 10 images of each person, and then combine (a) and (b) into one domain and number it as (A), using (A) as the source domain and (c)-(j) as the target domain respectively, so we set up a total of 8 tasks. In this experiment, based on the regional covariance descriptor [58], we adopted a ''rich feature extraction'' strategy to extract SPD manifold feature matrices data from images. Firstly, images are divided into regions, for ORL dataset, we uniformly resized each image to a 30 × 30 image, and the region window size is set to 8 × 8, the horizontal and vertical window sliding steps are both 8, so an image will produce 16 non-overlapping area blocks. Then, for each pixel (x, y) of the image, we define a 17-dimensional feature vector: y, I (x, y), I x , I y , I xx , I yy , where I (x, y) is the gray value of the pixel (x, y), I x and I y are the first step of the image in the horizontal and vertical directions respectively, I xx and I yy are the second step of the image in the horizontal and vertical directions respectively, and G i (x, y), i = 1, · · · , 8 are influence value of a set of 8 DOOG filters on I (x, y) [59]. Finally, we calculate the regional covariance matrix for each regional block, obtain a 16-dimensional SPD matrix from a picture. Since the data processed by the five comparison algorithms TCA, SSTCA, IGLDA, TIT, and GSL are all European eigenvector data, in order to compare fairness, we adopt the method in [60] to vectorize the upper (or lower) triangular matrix including diagonal elements in the regional covariance matrix, so the dimension of the feature vectors using by the comparison algorithms on ORL dataset is 17 × (17 + 1) ÷ 2 = 153. The experimental settings and parameter selection of our algorithm and the comparison algorithms are as follows: we choose the linear kernel uniformly for the kernel function of the comparison algorithms, the two subspace dimensions of our algorithm and the subspace dimension of the comparison algorithms are both set to 400 dimensions, the classifier adopts 1-NN classifier, and repeat each task 10 times. SMSBL-DA:λ = 1; TCA: µ = 10 −26 ; SSTCA: µ = 10 −25 , λ = 10 −7 ; IGLDA: µ = 10 −10 , λ = 2; TIT: µ = 10 −6 , λ = 10 −12 , β = 1; GSL: λ = 0.1, β = 5 × 10 3 . The experimental results are shown in Table 4, the bolded data in the table mark the best results in each task, and the 8 tasks of ORL dataset are numbered 1-8 in Fig. 10 following the order in the first column of Table 4. According  Table 4.
to the results of the table, among the 8 tasks in ORL dataset, SMSBL-DA has the best performance in 6 tasks, IGLDA has the best performance in 2 tasks, and SSTCA has the best performance in 1 task. The optimal proportion of SMSBL-DA is still the highest, and the average classification accuracy rate reaches the highest 72.50%. Compared with TCA, SSTCA, IGLDA, TIT and GSL, SMSBL-DA has 15.00%, 7.81%, 6.25%, 18.12% and 16.56% improvement in average classification accuracy, respectively. Specifically, for IGLDA, which performed the best in the comparison algorithms, it also increased by 6.25%. It shows that the classification effect of SMSBL-DA is stable, and it has certain robustness.

F. YALE DATASET FACE RECOGNITION TASK EXPERIMENTAL RESULTS AND ANALYSIS
YALE dataset [60] was created by Yale University and is often used in face classification experiments in the field of face recognition. YALE dataset collects 165 face images of 15 volunteers, and images were collected for each volunteer in 11 situations where there were differences in facial expressions, postures, movements, and light directions. In the experiment, the pictures of 11 situations are represented by letters (a)-(k), (a) and (b) are combined into a domain named (A). We use the data domain (A) as the source domain, (c)-(k) are respectively regarded as 9 target domains, so a total of 9 recognition tasks have been completed. Fig. 11 shows examples in 10 domains of face samples from YALE dataset.
In this experiment, based on the regional covariance descriptor [58], we adopted a ''rich feature extraction'' strategy to extract SPD manifold feature matrices data from images. Firstly, images are divided into regions, for ORL dataset, we uniformly resized each image to a 30 × 30 image, and the region window size is set to 70 × 51, the horizontal and vertical window sliding steps are both 8, so an image will produce 16 non-overlapping area blocks. Then, for each pixel (x, y) of the image, we define a 17-dimensional feature vector: f YALE x,y = [x, y, I (x, y), I x , I y , I xx , I yy , I 2 x + I 2 y , arctan( |I x | |I y | ), G 1 (x, y), · · · , G 8 (x, y)] (56) where I (x, y) is the gray value of the pixel (x, y), I x and I y are the first step of the image in the horizontal and vertical directions respectively, I xx and I yy are the second step of the image in the horizontal and vertical directions respectively, and G i (x, y), i = 1, · · · , 8 are influence value of a set of 8 DOOG filters on I (x, y) [59]. Finally, we calculate the regional covariance matrix for each regional block, obtain a 16-dimensional SPD matrix from a picture. Since the data processed by the five comparison algorithms TCA, SSTCA, IGLDA, TIT, and GSL are all European eigenvector data, in order to compare fairness, we adopt the method in [60] to vectorize the upper (or lower) triangular matrix including diagonal elements in the regional covariance matrix, so the dimension of the feature vectors using by the comparison algorithms on YALE dataset is 17 × (17 + 1) ÷ 2 = 153. The experimental settings and parameter selection of our algorithm and the comparison algorithms are as follows: we choose the linear kernel uniformly for the kernel function of the comparison algorithms, the two subspace dimensions of our algorithm and the subspace dimension of the comparison algorithms are both set to 400 dimensions, the classifier adopts 1-NN classifier, and repeat each task 10 times. SMSBL-DA:λ = 1; TCA: µ = 10 −10 ; SSTCA: µ = 10 −10 , λ = 10 −2 ; IGLDA: µ = 5 × 10 −11 , λ = 10 −7 ; TIT: µ = 10 −9 , λ = 1.1 × 10 −7 , β = 1; GSL: λ = 0.1, β = 5 × 10 3 . The experimental results are shown in Table 5, the bolded data in the table mark the best results in each task, and the 9 tasks of YALE dataset are numbered 1-9 in Fig. 12 following the order in the first column of Table 5. Among the 9 tasks in YALE dataset, SMSBL-DA has the best performance in 7 tasks, IGLDA has the best performance in 4 tasks, and GSL has the best performance in 1 task. The optimal proportion of SMSBL-DA is still the highest, and the average classification accuracy of SMSBL-DA reached 91.85%, the highest among all algorithms. Compared with TCA, SSTCA, IGLDA, TIT, and GSL, SMSBL-DA has improved the average classification accuracy of 21.48%, 30.00%, 15.55%, 22.22%, and 25.18% respectively. It can be seen that on YALE dataset, SMSBL-DA has significantly improved the average classification accuracy compared to SSTCA and GSL. At the same time, the classification accuracy of SMSBL-DA is relatively stable, between 80.00%-100.00%. It shows that the classification effect of SMSBL-DA is stable and has certain robustness.

TABLE 5.
Classification accuracy (%) of each algorithm in YALE dataset, the bold is the best result, and the classifier adopts 1-NN classifier. FIGURE 12. The 9 tasks of YALE dataset are numbered 1-9 in Fig. 12 following the order in the first column of Table 5.

VII. CONCLUSION
Traditional machine learning algorithms on SPD manifold usually adopt the RKHS as the workspace, and most of the advanced DAL algorithms still deal with Euclidean data. Different from all of them, the algorithm proposed in this paper explores a new space called SMS to perform the machine learning algorithm (in this paper, it is DAL) on SPD manifold. And the proposed algorithm may be the first attempt to use SMS as the workspace for DAL on SPD manifold. SMS is an inner product space, so it supports linear operations and distance measurements which are needed in machine learning algorithms. After extracting the principal components of each SPD matrix, then the SPD manifold can be transformed into SMS by the Symmetric Mapping, which is defined by the product of the principal components and their transpositions, and SMS is the least linear expansion from SPD manifold. log transformation is just to take the logarithm of the eigenvalues in the eigenvalue diagonal matrix, and log transformation of SPD data has the smallest change to the form and nature of the data. In this paper, we concentrate on using log transformation of data from SPD manifold to SMS to deal with the DAL algorithm based on bi-subspace learning, which uses the mean discrepancy minimized criterion to match the distributions of two domains. And we choose to add LDA regularization term to the subspace learning of the source domain data, which also reflects the benefits of the bi-subspace learning in our algorithm. But obviously, log transformation is also suitable for other machine learning algorithms on SPD manifold and undoubtedly the proposed method has strong extensibility and applicability.