Active Discriminative Cross-Domain Alignment for Low-Resolution Face Recognition

In real application scenarios, the face images captured by cameras often incur blur, illumination variation, occlusion, and low-resolution (LR), which leads to a challenging problem for many real-time face recognition systems due to a big distribution difference between the captured degraded images and the high-resolution (HR) gallery images. As widespread application of transfer learning in across-visual recognition, we propose a novel active discriminative cross-domain alignment (ADCDA) technique for LR face recognition method by jointly exploring both geometrical and statistical properties of the source domain and the target domain in a unique way. Specifically, the proposed ADCDA-based method contains three key components: 1) it simultaneously reduces the domain shift in both marginal distribution and conditional distribution between the source domain and the target domain; 2) it aligns the data of two domains in the common latent subspace by discriminant locality alignment (DLA); 3) it selects the representative and the diverse samples with an active learning strategy to further improve classification performance. Extensive experiments on six benchmark databases verify that the proposed method significantly outperforms other state-of-the-art predecessors.


I. INTRODUCTION
Face recognition is one of the most activate research topics in the field of computer vision. Under a controlled imaging condition, most face recognition systems under HR face images able to achieve satisfying recognition performance. Unfortunately, in many practical scenarios, the performance of face recognition systems tends to be degraded dramatically due to the negative influence caused by the captured LR face images. Therefore, how to improve the recognition performance of LR faces has gained much attention in the domain of cross-resolution face recognition by many researchers.
Generally speaking, for many traditional machine learning-based image classification tasks, a fundamental The associate editor coordinating the review of this manuscript and approving it for publication was Wei Wei . assumption is that the samples from the source domain and the target domain have same data distribution. Based upon this assumption, the classifier learned from the source domain is accurate enough for the images in the target domain. In fact, there exists a very big distribution gap between the gallery images and the probe images due to many degradation factors such as illumination variation, pose change, occlusion, and LR, resulting in a poor performance for face recognition systems. Fortunately, transfer learning and domain adaptation have gained huge success in cross-modality and cross-domain image classification, which provides an enlightenment and idea to LR face recognition.
For many visual recognition tasks, it is crucial to obtain sufficient labeled dada. Generally, based on the availability of labeled data in the target domain, domain adaptation can be generally divided into semi-supervised [1]- [3] and VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ unsupervised [4]- [8] domain adaptation. Since the unlabeled images are easier to obtain in practice, it is more significant to study unsupervised domain adaptation. In this paper, we mainly focus on the field of unsupervised domain adaptation.
According to a survey in [9], existing transfer learning approaches can be roughly categorized into four groups, namely instance-based, feature-based, parameter-based, and relational-based approaches, respectively. Instance-based [10]- [14] transfer learning approaches are mainly based on the instance weighting strategy. Feature transformation-based [15]- [19] approaches elaborate on transforming the source domain feature subspace into the target domain feature subspace, or transforming the feature spaces of both source and target domains into a common latent subspace to adapt two domains. Nevertheless, instance-based approaches are not fit for unsupervised visual recognition tasks due to the need of a large number of labeled source domain samples to train an accurate classifier for target domain. To mitigate this problem, in this paper, we will follow feature transformation-based approaches.
Pan et al. [20] proposed transfer component analysis (TCA) to learn the transfer components from different domains in Reproducing Kernel Hilbert Space (RKHS) with Maximum Mean Discrepancy (MMD), which aims to minimize the domain shift across domains distribution. To take advantage of low-dimensional structures that are intrinsic to many vision datasets, the work in [21] proposed a geodesic flow kernel (GFK) model to reduce domain shift by integrating an infinite number of subspaces that characterize changes in geometrical and statistical properties from the source domain to the target domain. JDA [22] improved the TCA by combining both marginal and conditional distributions to decrease the discrepancy between source domain and unlabeled target domain. Fernando et al. [23] introduced a subspace alignment (SA) framework to improve the matching of cross-domain image, which builds a subspace spanned by both source and target domains and learns a linear mapping to align the source subspace with the target subspace. In [24], TJM adapts both the distribution difference and the irrelevant instances by jointly matching the features and reweighting the instances across domains in a principled dimensionality reduction procedure. Considering that the existing unsupervised methods failed to take into account the difference of the two distributions in the subspace, Sun and Saenko [25] incorporates distribution alignment into subspace adaptation to address the problem. To improve alignment ability of the learned common subspace, the method in [26] proposed to impose low-rank and sparse constraints on the reconstruction coefficient matrix, so that the global and local structures of data can be well preserved. Tuia and Camps-Valls [27] proposed to transfer domain adaptation in a kernel-based feature space, and then manifold alignment is performed to preserve manifold geometric structures of both source and target domains. Ghifary et al. [28] presented scatter component analysis (SCA) to improve both domain adaptation and domain generalization by simultaneously maximizing the separability of classes and minimizing the mismatch between two domains. In [29], two coupled projections are learned to project the source domain and target domain data into a low-dimensional subspace by simultaneously reducing the geometrical shift and distribution shift. Wang et al. [30] proposed class-specific reconstruction transfer learning (CRTL) model to exploit the intra-class dependency and inter-class independency of the reconstructed transfer matrix. In the particular method low-rank and sparse constraints are imposed on the class-specific reconstruction coefficient matrix such that the global and local data structures that contribute to domain correlation can be effectively preserved.
Recently, utilizing deep adaptation network [31]- [33] to deal with domain adaptation has gained much attention by many researchers. For example, Tzeng et al. [34] proposed to use CNN to solve deep domain adaptation, where an adaptation layer and an additional domain confusion loss are introduced to learn a semantic and domain invariant representation. In [35], Sun et al. presented to extend CORAL [36] to learn a nonlinear transformation to align the correlations of layer activations in deep neural networks (called Deep CORAL). [37] and [38] proposed to unite adversarial learning and domain adaptation to narrow the distribution difference between source and target domains data by combining discriminative modeling, weight sharing, and a GAN loss. Kang et al. [39] utilized a contrastive adaptation network (CAN) to optimize a new metric which explicitly models the intra-class domain discrepancy and the inter-class domain discrepancy.
In this paper, we develop a novel active discriminative cross-domain alignment (ADCDA) approach to target LR face recognition. Fundamentally, our method is related to JDA [22] and JGSA [29] but has many unique features. First, the newly proposed method joints both geometrical and statistical properties to adapt data distribution between labeled source and unlabeled target domains. Different from the two previous methods, we simultaneously minimize the difference of both the marginal distribution and conditional distribution between two different image domains, where Maximum Mean Discrepancy (MMD) [40] is employed to measure the difference in both marginal and conditional distributions. Moreover, inspired by the philosophy of subspace alignment proposed in [41], we explore both global and local manifold geometric structures in the source domain for obtaining more discriminative latent common subspace. In addition, we select more representative and diverse samples in the source domain to train the latent common subspace so as to benefit more accurate classification for the target domain. Taking the aforementioned factors into consideration, we propose an Active Discriminative Cross-Domain Alignment (ADCDA) by jointing active learning and domain adaptation to target LR face recognition task.
In summary, the major contributions of this paper are outlined as follows.
1) We reduce the domain shift between the labeled source and the unlabeled target domains by jointing both geometrical and statistical properties. 2) We align the feature subspaces spanned by both labeled source and unlabeled target domains in a learned common latent subspace by integrating subspace learning with DLA. 3) To further improve the classification accuracy for target domain, we employ the philosophy of active learning to select the representative and the diverse source samples to train our domain adaptation model. 4) Our proposed ADCDA approach can achieve the best recognition performance on six standard face datasets among eight state-of-the-art predecessors.
The rest of this paper is organized as follows. We briefly review the related work in Section II. Section III details the proposed Active Discriminative Cross-Domain Alignment (ADCDA). The validating experiments on six benchmark face datasets are demonstrated in Section IV. Finally Section V concludes this paper.

II. RELATED WORK A. DISCRIMINANT LOCALITY ALIGNMENT
In the literature of machine learning, DLA [41] is an effective nonlinear dimensionality reduction algorithm. Unlike other classical dimensionality reduction algorithms such as principal component analysis (PCA) and linear discriminant analysis (LDA), the DLA performs a discriminant dimensionality reduction by using both local alignment and global alignment. It shows powerful capability of solving dimensionality reduction towards the samples with nonlinear distribution.
Generally, the DLA algorithm consists of three major stages: the part optimization, the sample weighting, and the whole alignment. In the phase of part optimization, the local patch constructed from the source domain samples contains rich discriminant information, where the local patch of each sample is associated with itself and its neighborhood samples belonging to the same class and different classes. In the phase of sample weighting, each partly optimized local patch is weighted by the margin degree to measure the importance of a given sample for classification. In the phase of whole alignment, the weighted and optimized local patches regarding each sample are globally aligned to further optimize the alignment matrix. Finally, a standard eigenvalue problem is solved to obtain the desired projection matrix for dimensionality reduction. The DLA shows good performance for high dimensional data with complicated nonlinear distribution.

B. ACTIVE LEARNING
In the machine learning domain, active learning sampling [42] is popular due to its capacity of finding a small number of representative and diverse labeled samples from a large number of training samples to train a more effective classifier. Usually, two important criterions, namely representativeness and diversity, are widely considered in the active learning sampling. Given a sample, its representativeness is evaluated by a Gaussian kernel as where . , x ns is the set of original source domain samples; σ R is the bandwidth of the Gaussian kernel, which is adaptively determined by the following Eq. (2), i.e., where ρ is a scale coefficient. Fundamentally, a sample with higher representativeness will share more common information than those with lower representativeness. The other criterion is diversity, which is evaluated by the following expression, i.e., where D (x i ) represents the diversity of sample x i , S is the set of selected informative samples. For face images, the regions of eyes, nose, and mouth show diverse characteristics. Hence, we can utilize the diversity criterion to probe those samples with different appearances to improve the generality of proposed method.
In order to balance the function of both representativeness and diversity, Hoi et al. [42] proposed to find the most informative sample repeatedly in the remaining candidate sample set: U = X − S by using the convex combination as below where λ is a trade-off parameter between the representativeness and the diversity of samples.

III. ACTIVE DISCRIMINATIVE CROSS-DOMAIN ALIGNMENT
In this section, we first describe the problem statement of proposed method. Next the details about ADCDA-based method, including its objective function and its optimization approach are described. Finally, a summarized algorithm for ADCDA-based LR face recognition is demonstrated.

A. PROBLEM DEFINITION
At first, we introduce the terminologies about transfer learning. The source domain data is indicated as X s ∈ D×n s and its marginal probability distribution as Ps (X s ), the target domain data is denoted as X t ∈ D×n t and its marginal probability distribution as Pt X t , where D is the dimensional size of image feature, n s and n t are the numbers of samples in the source and target domains, respectively. In this paper, we address an unsupervised domain adaptation problem of learning an accurate classifier with the help of sufficient labeled samples from the source domain Ds = (x 1 , y 1 ) , (x 2 , y 2 ) , . . . , x n s , y n s , and then apply it to classify the unlabeled samples in the target domain Dt = x 1 , x 2 , . . . , x n t . Assuming that the feature subspace and label subspace of the source and target domains are As a result, our aim is to reduce the distribution difference by assuming Ps (X s ) ≈ Pt X t and Q s Y s X s ≈ Q t Y t X t . However, the conditional probability distribution Q t Y t X t of target domain data is unknown. We address this problem by assuming [22].

B. ADCDA FRAMEWORK
The systematic flow chart of the proposed method is demonstrated in Fig.1. As illustrated, the newly proposed ADCDA-based LR face recognition method joints both geometrical alignment and statistical distributions to lessen the domain shift gap between the labeled source and unlabeled target domains. Moreover, to further improve the generality capability of the proposed method, we are in favor of using active sampling rather than random sampling, to choose more informative samples to train a more accurate classifier for the target domain. To accomplish LR face recognition, we conduct the domain adaptation by finding two coupled mappings, namely A for source domain and B for target domain to project the feature spaces of the source and target domains into a common latent subspace.

1) SOURCE DOMAIN GEOMETRIC INFORMATION PRESERVING
The success of DLA [41] motivated us to explore discriminative subspace spanned by the source domain. To this end, we combine local alignment, sample weighting, and global alignment to optimize a transformation matrix such that the discriminative information of source domain can be well preserved. The local alignment is donated as arg min where µ is a scaling factor ranged in [0, 1] to balance the importance of intra-class and inter-class samples with respect to x s i ; k w and k b are the nearest neighborhood samples belonging to the same class and the different classes with respect to x s i , respectively; indicates a coefficient vector of the i-th local patch and is the corresponding local alignment matrix.
To obtain better alignment, a margin degree function m i is introduced to punish the samples nearby the classification boundary. The margin degree function m i regarding to the i-th sample x s i is donated as 97506 VOLUME 8, 2020 where n i is the number of samples x s j whose labels are different from the label of x s j around a neighborhood circle of the sample x s i , δ is a regularization parameter, and t denotes a scaling factor [41]. By this way, the part optimization towards the i-th local patch can be weighted by a margin degree function of the i-th sample x s i , i.e., arg min For each local patch X s i , Eq. (7) is used to weight the part optimization to obtain the discriminative alignment matrix. Finally, the part optimizations of all the local patches are unified as a whole one by assuming that the coordinate for the is selected from the global coordinate X S = X s 1 , . . . , X s n s [41], such that where S i ∈ n s ×(k w +k b +1) indicates the selection matrix, which is defined as where Fi = i, i 1 , . . . , i k w , i 1 , . . . , i k b is the index set for the i-th local patch. By incorporating the selection matrix Eq. (9) into Eq. (7), the objective of alignment can be further rewritten as arg min By summing over all the part optimization described in Eq. (10) together, we can obtain the whole alignment as arg min ∈ n s ×n s represents the discriminative alignment matrix. Finally, we can obtain the updated discriminative alignment matrix by an iterative procedure, which contains the local manifold structures, the discriminant information, and the label information of source domain samples. The update procedure is represented as below Finally, the constructed discriminative alignment matrix for preserving the geometrical discriminative information of labeled source domain samples is formulated as below

2) DOMAIN SHIFT MINIMIZATION
We utilize the Maximum Mean Discrepancy (MMD) [40] criteria to reduce the statistical probability distribution of both source and target domains. The distribution distance between two domains can be measured by MMD as follow In the JDA [22], the authors proposed to apply a certain base classifiers trained on the labeled source data to find pseudo labels of the target data by employing an iterative pseudo label refinement strategy to minimize the difference between the conditional distributions of two domains. In the proposed method, we also follow JDA [22] to reduce the conditional distribution shift across two domains as below To achieve effective and sufficient domain adaptation, we simultaneously reduce the shift in both the marginal distribution and the conditional distribution between two domains. As such, the objective of minimizing domain shifts can be reformulated as below Like JGSA [29], Eq. (16) can be further transformed into a more concise matrix form by the knowledge of linear algebra, i.e., where

3) SUBSPACE ADAPTATION
To simultaneously minimize the distribution difference and achieve domain adaptation match between two different domains, similar to SA [23] and SDA [25], we make the feature subspaces of both source and target domains as closer as possible such that the domain shift can be effectively reduced. In order to reduce the distribution discrepancy across domains, the SA-based method [23] formulates an additional transformation matrix M to project the source feature subspace into the target feature subspace. Different from SA [23], we minimize distribution difference of cross-domain adaptation by learning two coupled mappings (A and B). With the two mappings, we can convert the feature spaces of two different domains into a common latent subspace. The minimization of adaptation towards the two coupled mappings is represented as To simultaneously explore the shared and domain specific features across domains, we incorporate Eq. (22) into Eq. (17) to adapt the geometrical and statistical distributions of source and target domains.

4) ACTIVE SAMPLING
To further boost up the performance of the proposed DCDA method, we employ an active sampling, which has been reviewed in Section II-B, to select more informative labeled samples in the source domain to train an accurate classifier for classification of the unlabeled samples in the target domain. The effectiveness between DCDA and ADCDA will be proved in the following experimental section.

C. OBJECTIVE FUNCTION AND OPTIMIZATION
By incorporating Eqs (13), (17), and (22) together, the final objective function of the proposed method is described as follow where α and β are two trade-off parameters to balance the importance between different terms; C s = I s − 1 n s 1 s 1 s T and C t = I t − 1 n t 1 t 1 t T are the centering matrixes; 1 s ∈ n s and 1 t ∈ n t are the column vectors with all ones.
Afterwards, two coupled mappings are constructed to match the source and target domains in a learned common latent subspace, which leads to an optimization problem as below (24) Similar to JDA [22], to obtain a more concise expression, let G T = [A T B T ]. Thus, the objective function in Eq. (24) can be rewritten as (25) Next, according to the constrained optimization theory, we let = diag (λ 1 , λ 2 , . . . , λ k ) ∈ k×k be the Lagrange multiplier, and the Lagrange function for problem (25) is derived as By setting ∂L ∂G = 0, we obtain a generalized eigenvalue decomposition as below Finally, the optimal adaptation matrix G is boiled down to solving Eq. (27) for the k smallest eigenvectors G = [ g 1 , g 2 , .., g k ]. A complete procedure of ADCDA-based method for LR face recognition is summarized in Algorithm 1.

A. BENCHMARK FACE DATABASES
In the experiment, we employ six benchmark face databases including YALE-B, UMIST, ORL, FERET, CMU-PIE, and Algorithm 1 (Active) Discriminative Cross-Domain Alignment (DCDA and ADCDA) Input: The source domain face images X s ∈ D×n s , the target domain face images X t ∈ D×n t , the class labels of source domain face images Y s ∈ 1×n s , the dimensional size of subspace k, and the model parameters λ, α, and β.

Output:
Adaptation transformation matrices: A and B, embeddings: Z s and Z t , and adaptive classifier f .

B. EXPERIMENTAL RESULTS
The recognition performance of the proposed method and the eight state-of-the-art approaches on six face datasets are illustrated in Table 1. As reported in Table 1, except slightly lower recognition than CDMMA on the YALE-B and AR face databases, the proposed DCDA (randomly divides source and target domains) method constantly achieves the best recognition performance on the rest of face databases with two resolutions. Moreover, when active learning is used to select source domain samples, the proposed ADCDA method significantly outperforms other competitors for all the face databases at different resolutions. More specifically, the improvement of ADCDA increases by 9.48 percent and 4.83 percent on YALE-B face database, and 4.02 percent and 3.41 percent on UMIST face database, and 6.9 percent and 6.0 percent on ORL face database, and 19.85 percent and 19.08 percent on FERET face database (except for HR-LDA), and 2.17 percent and 2.46 percent on CMU-PIE face database, and 9.59 percent and 4.84 percent on AR face database, respectively. Particularly, the recognition rate of our ADCDA-based method surprisingly achieves 100 percent on UMIST and CMU-PIE face databases, respectively. The perfect performance mainly contributes to jointly applying both geometrical and statistical properties, which benefits to not only reduce the geometrical shift of subspace but also minimize the distribution shifts between two domains. Moreover, active sampling selects more informative samples from the source domain and therefore it is propitious to enhance the generality performance of the proposed DCDA-based method. In the following experiments, we only demonstrate the experimental results obtained by ADCDA-based method as comparison.

C. EXPERIMENTAL ANALYSIS ON DIMENSIONALITY k OF FEATURE SUBSPACE
In this subsection, we study how the dimensional size k (Dim) of feature subspace affects the recognition performance. In this experiment, the rank is set to 1 for all databases. The size of LR face images of the YALE-B and CMU-PIE face databases is set to 8 × 7, that of UMIST and ORL face databases is set to 9 × 8, that of FERET face database is set to 8 × 8, and that of AR face database is set to 7 × 6, respectively. As shown in Fig.3, the proposed ADCDA can achieve the best performance when the highest feature dimension is used for matching. Except the AR face database, the performance of the newly proposed approach constantly performs better than other methods. Moreover, our method also exceeds the four transfer learning-based approaches.

D. EXPERIMENTAL ANALYSIS ON RANK
To examine how the rank influences the recognition rate, we range the rank from 1 to 10 to verify the advantage of the proposed method. In this experiment, the experimental settings are same as Section IV-C. As shown in Fig.4, for all benchmark face databases, the recognition rates of ADCDA as well as other approaches steadily increase as the rank increases within the range from 1 to 10. Particularly, the performance of our method is always better than other compared methods at each case except for AR face database.

E. EXPERIMENTAL ANALYSIS ON DIFFERENT RESOLUTIONS
In this subsection, we further evaluate the performance of different methods at two different resolution levels, namely 8 × 7 and 16 × 14 for the YALE-B and the CMU-PIE face databases, and 9 × 8 and 18 × 16 for the UMIST and the ORL face databases, and 8 × 8 and 10 × 10 for the FERET face database, and 7 × 6 and 14 × 12 for the AR face database, respectively. Fig.5 demonstrates the compared results on corresponding to different methods and two resolution levels. In terms of the results, we can see that except for AR face database, the proposed ADCDA method exceeds other methods at the all cases. VOLUME 8, 2020

F. EFFECTIVENESS ANALYSIS OF FEATURE SUBSPACE
In this subsection, we further examine the effectiveness of our cross-domain alignment method by projecting the learnt common features of six benchmark face databases into a 2-D space via PCA-based dimensionality reduction. The visualization results of compared approaches are illustrated in Fig.6. In terms of the results, we can find that the transformed features of source and target domains by the proposed ADCDA method in the common feature subspace can be aligned better than other predecessors. The major reasons lie in the following interpretations. The classical TCA method only considers marginal distribution of data without utilizing label information. The JDA method considers both marginal distribution and conditional distribution of data into consideration, but fails to reduce the geometrical shift during cross-domain adaptation. The TJM method minimizes the domain shift by jointly matching the features and reweighting the instances across domains but cannot consider any discriminative information, leading to the worst discrimination and separability. Although the JGSA method can effectively reduce both geometrical shift and distribution shift, it only explores the global discriminative information by using LDA. The CLPMs method elaborates on preserving the local geometric structures of source domain samples into the latent common subspace. Nevertheless, the method ignores the discriminative information of samples in the source domain. The CDMMA approach explores both the local geometric structure information and the label information to improve the discrimination of mappings. Unfortunately, many noticeably aggregated samples remain in the feature subspace. In contrast to the above competitors, the newly proposed ADCDA method can significantly promote the discriminative capability and separability of the samples in the common latent subspace. Particularly, the counterparts of face images within the same class are well aligned and aggregated while those from different classes are obviously separated from each other in the learnt common latent subspace. This is due to that the proposed method jointly frames the local patches optimization, the sample weighting, and the global alignment of samples into a unified discriminative alignment matrix, which benefits to adapt the nonlinear distribution of face images in the source and target domains, leading to a more discriminative feature subspace than other competitors.

G. PARAMETER SENSITIVITY
In the proposed ADCDA, three important parameters, namely λ, α, and β, fundamentally affect the performance of proposed algorithm. To investigate how the three parameters have influence on the recognition performance, we take the YALE-B face database as test and repeat the experiments five times to explore the influence. To this end, we vary λ from 0 to 1 at an interval of 0.1, and empirically set α ∈ [0, 0.1] and β ∈ [0.001, 100] to search the optimal parameters. It is worth noticing that λ is a trade-off parameter between the representativeness and the diversity, where λ → 0 indicates the diversity of samples while λ → 1 represents the representativeness. Fig.7a illustrates the changes of recognition rate varying with λ. As illustrated from Fig.7a, the proposed ADCDA-based method tends to obtain better performance with a smaller value of λ. It reflects the importance of diversity of samples to classifier. α is another trade-off parameter for balancing the importance of geometric structures of samples. As shown in Fig.7b, we can find that the recognition performance will drop greatly when the geometric structures of samples are not considered. By contrast, the recognition rate rises rapidly when the geometric structures of samples can be fully utilized. The fact proves that it is essential to explore both the intra-class and the inter-class information for improving the performance of transfer learning based visual recognition tasks. In addition, β is used to adapt the geometrical and statistical distributions between the source and target domains. As shown in Fig.7c, the recognition rate of the proposed method descends rapidly as β increases, which means that a smaller value of β is more beneficial to improve the recognition performance.

V. CONCLUSION
In this paper, we have proposed a novel ADCDA-based transfer subspace learning approach for LR face recognition. In the proposed method, both marginal distribution and conditional distribution between the source domain and the target domain are simultaneously explored to reduce the discrepancy of the two domains. To further improve the generality performance of proposed method, we employ active sampling to select more representative as well as diverse samples for subspace learning. Comprehensive experiments carried out upon six benchmark face databases have verified the effectiveness of our newly proposed method.
In the future, we will extend our model from two aspects. On one hand, to enhance the robustness of estimated pseudo labels of target domain samples, we can frame our method under a co-transfer ensemble learning framework [45]. On the other hand, to estimate more accurate conditional distribution of target domain, other semi-supervised manifold learning-based method [46] can be utilized to optimize the pseudo labels.