Discriminative Face Hallucination via Locality-Constrained and Category Embedding Representation

Recent years have witnessed the rapid development of face image hallucination techniques. However, the previous face hallucination methods are unsupervised and ignore the label information of training samples, leading to undesirable results. This article proposes a locality-constrained and category embedding representation (LCER) method to super-resolve face image in a supervised manner by embedding the label information in data representation. The proposed LCER incorporates the locality prior and category information into one unified framework, which aims to learn both the advantages of locality in preserving the true typologic structure of data manifold and the discriminability in exposing the class subspace information. Such strategy allows the LCER not only to preserve more sharpen image details but also to guarantee the face structure pattern be transferred mainly from the same subject in super-resolution reconstruction. Extensive experiments were conducted to evaluate the proposed LCER, and the comparative results demonstrate that it achieved superior face hallucination performance in both the quantitative measurements and visual impressions compared to several state-of-the-art.

hallucination methods in literature can be classified into two main types, namely, global face methods based on statistical models and local face methods based on small patch priors.
The statistical model-based global face methods try to reconstruct HR images on the correlation mapping derived from the LR and HR face image pairs by leveraging the face statistical models. Exemplificative models, including but not limited to the principal component analysis (PCA) [4], the locality preserving projections (LPPs) [5], the canonical correlation analysis (CCA) [6], the singular value decomposition (SVD) [7], [8], the orthogonal procrustes regression (OPR) [9], and so on. Though these algorithms are able to preserve the global structures of human face well, the global-based methods share a common challenge in preserving image details like edge and texture information.
In contrast to global reconstruction, the local patch priorbased face approaches can further preserve the image details via handling small patches rather than holistic images. The neighbor embedding (NE)-based method is the first attempt that studied the correlations from neighbor patches for super-resolution reconstruction [10]. Jiang et al. [11] further presented a coupled-layer NE (CLNE) with graph regularization for facial image hallucination, where a more robust NE was achieved by iteratively updating the representation weights. The success of NE-based methods is attributed to the assumption that the LR and HR patch spaces maintain the same topology structure, which is originated from the wellknown locally linear embedding (LLE) [12] algorithm. Thus, the projection coefficients learned from the LR patch space can be used directly in the HR patch space to synthesis the desirable HR patch.
Actually, the human face is highly structured, and similar patterns are always trend to present in the same position. Based on this observation, Ma et al. [13] introduced a least square regression (LSR)-based position patch method for face hallucination, where the query LR patch was projected on the space spanned by all the training patches seated at the identical location. A flaw of LSR is that it employs no constraint on representation coefficients, leading to unstable solutions. Up to now, various prior knowledge has been exploited to stabilize the linear system, among which the sparsity and locality are the two representative priors.
The sparsity prior assumes that data is inherent sparse and can be approximated by a subset of training samples [14]- [16]. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ Motivated by this, Yang et al. [17] developed a sparse representation model for face image super-resolution with the assistance of coupled LR and HR dictionaries. Li et al. [18] proposed to hallucinate face image by learning the sparse local-pixel structures of the target HR facial images. Wang et al. [19] introduced an l p -norm induced weighted sparse representation model for face hallucination. Jiang et al. [20] presented a smooth sparse representation (SSR) to hallucinate face image, which tries to strengthen the smooth structure of the training data manifold. The collaborative representation is involved to face super-resolution too, since it can achieve a comparable performance compared to sparse coding but with a very faster speed [21]. To take advantage of deep architecture, Farrugia and Guillemot [22] proposed to hallucinate the face image via a coupled layer collaborative representation.
In contrast, the locality prior induced by weighted l 2 -norm encourages the training samples similar with the test one to contribute significantly to the final result. Jiang et al. [23], [24] introduced the locality-constrained representation (LcR) scheme to model the position patches for face hallucination. The representation capacity of LcR was later enhanced by incorporating with Tikhonov regularization [25]. Shi et al. [26], [27] employed the LcR model in high dimensional feature space induced by a kernel function, where the nonlinear characteristics of the original HR space can be captured to enhance the qualities of super-resolution results. The LcR framework was further extended into the quaternion domain [28] and a quaternion locality-constrained coding (QLC) was proposed for color face image super-resolution [29]. Later, Jiang et al. [30] presented a thresholding LcR with the strategy of reproducing learning (TLcR-RL) for face hallucination, in which the newly reconstructed face was added back to the training dataset to strengthen the distribution comparability between the query face and the training images. Considering that images are easily corrupted by noise in practice, Liu et al. introduced two different reweighting strategies into the locality model and presented a robust locality-constrained bilayer representation (RLcBR) [31] as well as an iterative relaxed collaborative representation (iRCR) [32] to hallucinate face images affected by impulse noise and Gaussian noise, respectively.
Recent years, the deep learning techniques have been successfully applied for image super-resolution and face hallucination. For example, Dong et al. [33] proposed a convolutional neural network (SRCNN) for general image super-resolution. This article is the first attempt to utilize deep models for superresolution reconstruction by learning an end-to-end mapping between the LR and HR spaces. Liu et al. [34] introduced the domain expertise to a cascade sparse coding-based network (CSCN) for super-resolution. Lu et al. [35] presented a deep linear mapping learning (DLML) framework for face hallucination. Kim et al. [36] proposed a very deep convolutional network for super-resolution (VDSR), in which the cascaded small filters were used to extract context characteristics over large image regions. Ledig et al. [37] extended the generative adversarial network (GAN) to SRGAN for Fig. 1. Discriminative face hallucination problem encountered in the face recognition system, where the training images are partitioned into different classes and the input probe image belongs to one of the classes in the training dataset but with low resolution. The low-resolution problem will severely degrade the performance of the recognition system. Therefore, how to improve the qualities of test face images with the assistant of labeled training samples becomes an emergency issues in the face hallucination field. single image super-resolution. Yu and Porikli [38] adopted the decoder-encoder-decoder scheme and presented a transformative discriminative autoencoder for face hallucination. Generally, the deep face hallucination trends to utilize certain pretrained deep models and fine tuning with the facial priors to generate temporary faces, which are further used to sharpen the hallucination results from the regression model [39], [40].
Though the previous methods achieved promising performance to a certain extent, one common limitation is that they ignore the discriminative face hallucination problem which is frequently encountered in the face recognition system. Considering that the LR problem commonly exists in many practical face recognition systems, and the degraded input face images will severely decline the system performance (see Fig. 1) [41]. One intuitive way to promote the system performance is to improve the visual quality of input face images by some appropriate face image super-resolution techniques. Therefore, how to design the discriminative face hallucination methods to improve the qualities of test face images with the assistant of labeled training samples in the recognition system becomes an emergency issue in the face hallucination field.
However, the aforementioned face hallucination methods mainly focus on the manifold structure assumption but rarely take into consideration of the discriminative information of training data, which significantly benefits the image representation and classification applications [42], [43]. To address this concern, in this article, we propose a new discriminative face hallucination method named locality-constrained and category embedding representation (LCER) from a different point of view to previous ones. In LCER, the unsupervised locality-constrained coding and supervised category embedding are united into one framework, which benefits our method in twofold. First of all, since the proposed LCER is a positionpatch-based approach, it is reasonable to apply the nearest neighbors to represent the test patch. Moreover, for discriminative face super-resolution, since the label information of training samples is known in advance (shown in Fig. 1), it is meaningful to encourage the samples with the same label to offer more contribution in data representation. The advantages of using locality prior as well as label information admits the proposed LCER method to preserve more image details and desirable face patterns in super-resolution reconstruction. The contributions of this article are summarized as the following threefold.
1) The proposed LCER employs label information of training samples in data representation, which guarantees the face structure pattern be mainly transferred from the correct class in HR face synthesis. To the best of our knowledge, this is the first attempt to super-resolved face images in the supervised manner.
2) The discriminative term that exploits the local information for super-resolution representation in LCER is interpreted from the Bayes probability perspective. This not only ensures the rationality of the proposed model but also demonstrates the capacity of LCER in preserving the face subspace features from the statistical viewpoint. 3) We have carried out extensive experiments on both the standard face datasets and real-word images to verify the efficiency and effectiveness of the proposed method in hallucinating both the clean and noisy face images. The remiander of this article is organized as follows. The proposed discriminative face hallucination model, as well as the corresponding optimization strategy is introduced in Section II. The experimental results and analysis are presented in Section III. Section IV draws a conclusion.

A. LCER Model
In this article, we consider a different scenario, where the training samples contain the face images that belong to the same class (person) but with different illuminations and expressions associated to the input test one. Such scenario is necessary and meaningful, especially, for the video surveillance system (see Fig. 1). Therefore, in such case, the face hallucination becomes into the task that super-resolves the LR observations by the training samples partitioned into different classes, among which the test face image belongs to. Such face image super-resolution problem with labeled training data is called discriminative face hallucination.
Suppose that the training LR and HR datasets are composed of C classes, e.g., where each face image is stretched into an N dimension column vector for the LR face and an M dimension column vector for the HR face, and the ith class includes J i number of face images, that is, where P is the number of overlapping patches in each row, and Q is the number of overlapping patches in each column. Analogously, the LR and HR training image pairs can be also divided into small patch sets, basically, it can be represented by a linear combination of the corresponding LR training image patches seated at the same position Equation (1) is a linear regression problem, whose solution is [44] in which the superscript T denotes the transportation operator.
Unfortunately, (1) may obtain unstable solutions, especially, when K is larger than n. Thus, additional regularization is needed in the objective function to penalize the representation coefficient as follows: One vital problem in (3) is the chosen of penalty function (·). Different settings of will result in different regularization terms, achieving different representation models. For example, when is set as l 1 -norm, (3) becomes to the wellknown sparse representation, while choose to be l 2 -norm, (3) changes into the so-called collaborative representation. Though the work in [45] argues that sparse prior of the coefficient vector s(i, j) is important, the work in [46] claims that the sparsity may be not so necessary since it is the collaborative representation but not the sparse representation that makes the model efficiently in representation.
For the discriminative face image super-resolution, since the label information is available in the training dataset, it is believed that, similar to the supervised approaches [47], the utilization of label information can help to improve the representation capacities of face hallucination models. Motivated by this, we suggest the following discriminative model for face image super-resolution: where z L c is the estimation of y L on the subspace spanned by training samples from cth class and z L is the estimation of input LR face y L with respect to the whole training samples.
Since both z L c and z L are the reconstructions of y L associated to X c and X, respectively, To emphasize the discriminative information, we choose to minimize the Euclidean distance between X L c (i, j)s c (i, j) and X L (i, j)s(i, j), therefore, the third term in (10) is specified as In the proposed LCER, the majority of samples used for representation of the query patch come from the same class. Moreover, some samples, though from different classes but similar to the test one, are also used for data representation. In contrast, the LSR used all the training samples, while the SR just controls the sparsity of data encoding. The LcR takes into account the similarity prior between the test sample and training samples, but without considering the class label information.
Equation (5) penalizes the summation of the differences between the reconstruction from each class and that from the whole training dataset. By minimizing (5), the input image patch y L (i, j) is encouraged to be represented by the training samples with the identical label associated with y L (i, j). This can induce discriminative information for representing the query image patch (the visual impression of LCER is shown in Fig. 2).
To further emphasize the contribution of the class in representation, we add a weight into (5) as where β c is the weight assigned for cth class which is defined as The l 2 -norm is chosen to serve as (·) to penalize the coefficients. However, different from collaborative representation, which uses l 2 -norm to regularize the coefficients directly, we prefer to regularize the weighted coefficients as follows: represents the Hadamard product and weight vector that penalizes the distance between the test face image patch and each training patch. The kth entry of τ (i, j) is defined simply by the Euclidean distance As can be seen that, if the kth training sample approaches to the test patch, it is assigned by a small weight, while a large weight is assigned to the training sample, that is, far away from the test one. Therefore, by using the Euclidean distances as the weights, the nearest neighbors of the test patch in training samples will provide more contribution in representation than those non neighborhoods, which can enclose the true topological structure of the patch manifold [48].
Substituting (5) and (8) into (4), we obtain the final LCER framework for discriminative face hallucination min s(i,j) The proposed LCER model utilizes the locality prior and the discriminative information for face image hallucination. It is expected to learn both the advantages of locality in preserving the true manifold structure and the discriminant in exposing the class-specific information. On the one side, for discriminative face image super-resolution, since the label information of the test image is known in advance, it is reasonable to utilize the data from the same class to reconstruct the HR face image. On the other side, the proposed framework is a position-patchbased method, in which small position patches are served as basic units to synthesize the target HR patches. Therefore, it is meaningful to use the nearest neighbors not limited to the corresponding class but throughout the whole training dataset to represent the test LR patch, since small similar patterns can exist across different images in different classes [21].

B. Interpretation From Probability Perspective
For a deep observation, the innovation of LCER is mainly contributed to the discriminative term γ C c=1 β c X L c s c − X L s 2 2 , which admits the proposed LCER significantly differ from LcR or other typical face hallucination models. Actually, the discriminative term in LCER can be interpreted from the probability viewpoint as follows.
For a collection of LR training samples from C classes , where X L c denotes the data matrix of class c and each column of X L c is a sample vector. Thus, the data matrix X L can be viewed as an expanded class, and we use l L X to represent the label set of all the candidate classes in Recall that the discriminative learning is try to approximate y L by a proper subspace rather than the holistic one. In other words, it is expected that the label of x L should be close to x L c as much as possible to preserve the face subspace characteristics. This can be achieved by maximizing the following probability: One can adopt the Gaussian kernel, which is a widely used measure to characterize the neighbor-based similarity, to define the probability [42] To further improve the representation capacity, we choose to maximize the joint probability Applying the logarithmic operator and discard the constant terms, (12) can be reformulated as where β c is the class specific parameter, indicating certain prior knowledge. As can be seen, (13) is actually the third term of the objective function in LCER.

C. Optimization Strategy
The proposed LCER model admits a closed form solution since it is actually a regularized least square problem. In this section, the indices i and j are omitted without confusion. For convenience, we write the objective function of (10) into subobjectives as follows: with β c X L c s c − X L s 2 2 be the data fidelity term, the locality regularization term, and the category regularization term, respectively. The second term can be rewritten as where diag(·) is diagonalization operation which transforms a vector into a diagonal matrix with the diagonal be the vector.
To maintain the consistency of the variables, we convert X c intoẊ c which has the same size as X. The elements of X c are assigned toẊ c at their corresponding locations in X, while the elements located at other locations are set to zeros. Mathematically,Ẋ c is defined aṡ Therefore, the third term in (14) can be reformulated as in whichX L c = X L −Ẋ L c . With the constraint K k=1 s k = 1, we rewrite the objective function into the following form: where Finally, the optimization problem in (10) is formulated as follows: The Langrange of the objective function is where ν is the Langrange multiplier. The optimal s can be obtained by taking the derivative of L with respect to s and ν, and setting them to zeros, respectively, From the first equation in (21), one can obtain Substituting (22) into the second equation in (21), we have Finally, by substituting (23) into the first equation in (21), the optimal s can be obtained Instead, one can also solve the linear system of equations Qs = 1 to avoid computing the inverse of Q. Then the obtained coefficient vectorŝ need to be normalized so as to meet the constraint 1 T s = 1. the HR and LR training sets, respectively, the primary task is to forecast the HR face image Y H from its LR observation Y L . First, the input LR and training face images are all partitioned into small overlapping patches by the separation strategy used in [23]. Each input LR test image patch now can be represented via a linear combination of the LR training ones located at the identical position by using the proposed LCER model. By doing this, a weight vector (representation coefficient) will be achieved associated to the LR training image patches. Analogously, by assuming the HR image patch manifold share the identical topological structure with the LR one [10], the target HR image patch can be reconstructed by transforming the weights from the LR image patch manifold to the corresponding HR ones. The final HR face image can be achieved by concatenating all the synthesized HR image patches to the corresponding positions with the pixel values in the overlapping regions be averaged. The flow chart of position-patch-based face hallucination method via LCER is shown in Fig. 3, and the implementation details are summarized in Algorithm 1.

E. Initial Label Estimation
As discussed previously, the label information of the input LR face is needed in the proposed LCER method. However, in practical application scenarios, e.g., the face recognition system, the label of the input image is actually unknown, which usually needs to be identified. To solve this problem, an intuitive way is employing some well-known classification algorithms to recognize the label of the input image. To this end, various face recognition methods can be chosen. In this article, we estimate the label of input LR face by the LLC algorithm. Mathematically, the input LR face y L is first represented via LR training samples with the LLC model   where X L is the codebook with each column be a vectorized training image, τ is the correlation adaptor, and s is the corresponding encoding coefficient vector.

Algorithm 1 Robust Face Super-Resolution via LCER
Once the optimal coefficientŝ is obtained, the sparse encoding residual associated to each class can be calculated where δ i (ŝ) means selecting the coefficients ofŝ corresponding to the ith class.
Finally, the class of the input LR face y L is determined by the one that achieves the minimal sparse coding residual

A. Experimental Settings
In our experiments, two public face datasets are chosen to evaluate LCER in face hallucination from different perspectives.
AR Dataset: A subset of 700 face images associated to 50 subjects (persons) are selected for experiments. Thus, each subject holds 14 face images, and we randomly select one image for testing with the remaining 13 ones for training. Thus, all the test face images are absent in the training set. The HR face images are cropped into 120 × 100 pixels and aligned according to the positions of two eyes.
NUST RWFR Dataset: There are totally 2400 face images with size of 80 × 80 pixels corresponding to 100 persons (classes) in NUST RWFR dataset. That is, each person holds 24 face images with various expressions and illuminations. Moreover, the face images in NUST RWFR dataset are not well aligned, leading to a more difficult hallucination problem. One face image was randomly chosen from each class for testing. Therefore, 100 face images are used as test images while the remaining 2300 ones are used for training.
The LR face images are obtained by downsampling (by a factor of 4) and blurring (by a 4 × 4 average smoothing filter) the corresponding HR faces. For the position-patch-based method, small image patches should be first extracted from the holistic image. In order to balance the super-resolution performance and the computational time, the HR patches are recommended with the size of 12 × 12 pixels, with four pixels (the experiments of patch size and overlapping analysis have been conducted in Section III-F) be overlapped between two adjacent patches. The two parameters λ and γ are empirically set to be 0.9 and 0.005, respectively.

B. Comparison on Standard Datasets
This section evaluates the proposed LCER method by comparing with five state-of-the-art face super-resolution approaches, namely, the LSR [13], the LcR [23], the linear model of coupled sparse support (LM-CSS) [22], the SSR [20], and the RLcBR [31]. The hallucination results are quantitatively evaluated by two commonly used quality assessment measurements, namely, peak signal to noise ratio (PSNR) and structural similarity (SSIM) [49]. Generally speaking, higher PSNR and SSIM values always indicate better qualities of the hallucination results. The source codes of LcR, LM-CSS, SSR, and RLcBR are provided by the original authors, while the other methods are implemented by ourselves. Fig. 4 plots the PSNR and SSIM scores of the hallucination results for all the 50 test images in the AR dataset generated by these comparison methods, while the averaged PSNR and SSIM values are reported in Table I. One can see that, the proposed LCER method obtains the highest PSNR and SSIM values for almost all the face images in the AR dataset. The quantitative measurement results indicate the proposed LCER method achieved the best face superresolution performance. This owes to the LCER not only taking into consideration of the locality prior of patch manifold but also the discriminative label information of training data in super-resolution processing.
To give some visual impressions, Fig. 5 shows some face hallucination results of the AR database from all the compared methods. In this figure, the first column exhibits the LR faces, the last column lists the ground truth, and columns 2-6 present the hallucination results of different methods, respectively. From Figs. 4 and 5, the following conclusion can be drawn.
1) The locality-constrained-based methods (e.g., LcR and RLcBR) obtained better super-resolution results than the LSR-based method (i.e., LSR). This verifies the locality prior which has been proven more important than sparsity, indeed helpful to reconstruct local detailed patterns in the hallucination results.
2) The two-layer-based representation methods (e.g., LM-CSS and RLcBR) achieved relative poor performance. The reasons are as follows. Though it employs two layers to exploit the local geometrical structure on the HR manifold for super-resolution reconstruction, the performance of LM-CSS is limited by the representation capacity of the collaborative representation model. The RLcBR utilizes an HR layer to compensate the noisy LR layer which is effective for noisy face hallucination, but useless for clean face hallucination.
3) The SSR obtains better super-resolution results than the locality prior-based methods (LcR, RLcBR). This is because that the SSR presents the smooth geometrical structure of the training image patch space by encouraging similar training samples holding similar sparse coding coefficients. 4) All the previous schemes are unsupervised and do not comprehensively take into account the category characteristics in super-resolution, ignoring the discriminative face patterns. In contrast, the proposed LCER achieves the most satisfactory hallucination performance, in which the face details have been well reconstructed. The reason lies in that the results of LCER benefit from the locality prior knowledge to preserve local geometrical manifold as well as the category embedding to ensure that the discriminative face patterns are mainly transferred from exemplar images belonging to the same subject.

C. Reconstruction Comparison of Each Class
In the proposed LCER model, a discriminative category embedding regularization is used to encourage the training samples from the correct category to play key roles in representation. Such strategy guarantees the reconstructed face structures be mainly transferred from the exemplar face images corresponding to the same subject.
To show the effectiveness of the discriminative term, in what follows, we exhibit the reconstruction results of two face images in the AR dataset from the most related five classes. The reconstruction results are depicted in Fig. 6, where the first column lists the original HR faces, columns 2-6 show the reconstructed faces from most related five classes. As can be seen, the reconstruction results from the correct class (the most relevant class) shown in the second column contain the major patterns of the target HR faces. The reconstructions from other classes hold little patterns but some detail information. This demonstrates that the discriminative term indeed helpful in excavating the inherent characteristics from the same subject for super-resolution reconstruction.

D. Robust Against Noise
In reality, the captured images are inevitably influenced by noise. Thus, a good face hallucination method should be able to not only enhance face details but also robust to noise. In this section, we conduct extensive experiments on the AR dataset to assess the robustness of all the comparison methods by super-resolving noisy face images. The noisy LR faces were generated by adding different levels of Gaussian noise (with standard deviation σ = 1, 3, and 5) to the LR faces. The noisy LR faces were then fed directly into each compared method. The results were still evaluated by PSNR and SSIM indexes, and the averaged PSNR and SSIM scores are plotted in Fig. 7. As can be seen, the LCER obtains the highest PSNR and SSIM values for noisy LR face image super-resolution. This demonstrates that the proposed method is still effective in noisy environment.

E. Results on Very LR Face Images
In this section, extensive experiments were carried out to evaluate the efficiency of LCER in hallucinating very LR face images. The very LR faces are generated by blurring and downsampling the corresponding HR face images with factors of 8 and 16, respectively. The very LR face images were then hallucinated by all the comparison methods and the results are assessed by PSNR and SSIM. The averaged PSNR and SSIM values of all the 50 test images in the AR dataset and 100 test images in NUST-RWFR dataset with different magnification factors are tabulated in Table I. From this table, one can see that, the proposed LCER method obtained higher PSNR and SSIM values than other state-of-the-art for all the cases. Interestingly, the larger the magnification factor is, the more PSNR and SSIM gains achieved by the proposed LCER. Fig. 8 shows the super-resolution results of two LR face images downsampled by factors of 8 or 16 from different methods. As can be seen, the performance of all the comparison methods degraded with the increasing of the downsampling factor. This is reasonable since the large the downsampling factor is, the more image details lost in the LR images, resulting in a hard recovery problem. However, the LCER still preserved well the texture details even in high magnification factor environment where the structure characteristics are almost lost because of the large downsampling factor.

F. Effect of Different Patch Sizes and Overlappings
In this section, we test the performance of the proposed LCER method with various settings of patch size and overlapping pixels. The experimental settings are as follows. The HR images are set to be with the size of 4 × 4, 8 × 8, 12 × 12, and 16 × 16 pixels with the overlaps between two adjacent patches to be 0, 4, 8, and 12 pixels, respectively. Thus, the patch size of the corresponding LR images is 1 × 1, 2 × 2, 3 × 3, and 4 × 4 pixels with 0, 1, 2, and 3 pixels be overlapped between two adjacent patches. The LCER method with different patch size and overlap settings are applied to the 50 LR face images in the AR dataset. The averaged PSNR and SSIM values are tabulated in Table II. As can be seen, for the same patch size, the performance of LCER becomes better when the overlap becomes larger. This is reasonable since larger overlap brings in more information from adjacent patches to synthesis each pixel in the reconstructed results. For the same overlap, the performance of LCER improves along with the patch size increasing except for the patch size with 16 × 16 pixels. Basically, image patches with large size contain more patterns than small patches, and can provide more useful information for representation. However, too large patch may cause smoothness and lose some face details in super-resolution reconstruction, leading to undesirable results. Generally speaking, the patch size should be set neither too large nor too small. Too small patch size with little overlapping pixels will lost the human structure patterns in super-resolution reconstruction, while too large patch size with many pixels overlapped will not only lost the face details but also increase the computational complexity dramatically. Therefore, the HR patch size is set to be 12 × 12 with 4 pixels been overlapped between two adjacent patches to balance the hallucination performance and computational complexity.

G. Locality Versus Discriminate
The proposed LCER achieves promising face superresolution performance, attributing to two priors it used, namely, locality and discriminability. The former is used to enclose the inherent manifold structure, while the latter is utilized to enhance the pattern representation. To further verify the roles they played in (10), we implement two variants of LCER. Denote by "LCER-L" the model designed from LCER without the discriminate term, that is, γ = 0 in (10). The other variant denoted by "LCER-D" is designed from the final model by discarding the locality term, that is, λ = 0 in (10).
The two variants are then used to hallucinate the LR face images in the AR database and compared with LCER. The comparisons of averaged PSNR and SSIM values are in Fig. 9. From the figure, it can be seen that LCER-L and LCER-D gained competitive performance. This indicates that the locality and discriminate priors play almost equal contribution in representation. In contrast, the LCER obtains the highest scores, which means that both the locality and discriminability are necessary and meaningful in the final face hallucination model, without any of which, the performance of the final model will be degraded.

H. Hallucination of Real World Images
In the experiments mentioned above, the used LR images were generated by conducting downsampling and blurring operators on the corresponding HR images. However, as pointed out in [50], the actual spatial feature correlation between the LR and HR face spaces cannot be correctly determined by the manually adjusted LR face images.
In what follows, we conduct extensive experiments in realworld images to further evaluate the performance of LCER in practical applications. Two images holding the real-life scenes shown in Fig. 10 are chosen from the CMU + MIT database [51] for testing. The two images are with low resolution and low quality since they are captured from a certain distance. To further well simulate the real scenarios, the two image are corrupted by Gaussian noise, where seven face images are manually cropped for testing. The comparison hallucination results are shown in Figs. 11 and 12. As can be observed, the proposed LCER method can still produce high-quality HR face images in the real-word scenarios.

I. Computational Complexity Analysis
The computational cost of our proposed method is mainly come from the calculation of matrix Q, the matrix inverse and product in (24). Basically, it requires O(nK 2 + CK 3 ) operators for the calculation of matrix Q. For the matrix inverse and multiplications, O(2K 3 ) operators are needed. Thus, the total computational complexity of LCER is O(nK 2 +(C+2)K 3 ) for each LR test patch, in which n is the dimension of LR patch, C is the number of classes in training data, and K denotes the total number of training samples.
In addition, we also compared the running time of our proposed method against to that of other methods. One face image in the AR dataset is randomly chosen as the test data. All the experiments are carried out on a PC equipped with 2.60-GHz CPU. The running time of each method is tabulated in Table III. One can see that, the LCER is slightly slower than the single regression methods (LSR, LcR), but faster than these two layer-based methods (LM-CSS, RLcBR) and the smoothness regularized method (SSR).

IV. CONCLUSION
This article presented a novel LCER for discriminative face image super-resolution. In contrast with the previous works that are unsupervised and take into no account the class label information, the proposed LCER super-resolves face images in a supervised manner by learning a category embedding representation. More specifically, a category embedding penalty term is introduced into the objective function, enabling the training samples holding the same label with the query one to offer more contribution in super-resolution reconstruction. This guarantees that the face patterns are mainly transformed from exemplar faces belonging to the same subject. Besides, considering that similar small patches distribute throughout the whole training data space, we employed the locality penalty to force training samples not limited to the same class but close to the test one offer more contribution to the final representation. The union of locality prior and category information admit the hallucination results of LCER to preserve more image details and desirable patterns. The evaluation results demonstrate the superiority of LCER compared with several state-of-the-art schemes in terms of both quantitative measurements and visual impressions.