Locality Constraint Dictionary Learning with Support Vector for Pattern Classification

Discriminative dictionary learning (DDL) has recently gained significant attention due to its impressive performance in various pattern classification tasks. However, the locality of atoms is not fully explored in conventional DDL approaches which hampers their classification performance. In this paper, we propose a locality constraint dictionary learning with support vector discriminative term (LCDL-SV), in which the locality information is preserved by employing the graph Laplacian matrix of the learned dictionary. To jointly learn a classifier during the training phase, a support vector discriminative term is incorporated into the proposed objective function. Moreover, in the classification stage, the identity of test data is jointly determined by the regularized residual and the learned multi-class support vector machine. Finally, the resulting optimization problem is solved by utilizing the alternative strategy. Experimental results on benchmark databases demonstrate the superiority of our proposed method over previous dictionary learning approaches on both hand-crafted and deep features. The source code of our proposed LCDL-SV is accessible at https://github.com/yinhefeng/LCDL-SV


I. INTRODUCTION
Dictionary learning (DL) has aroused considerable interest during the past decade and has been adopted in a wide rang of applications, such as face recognition [1], image fusion [2] and person re-identification [3], [4]. According to the characteristic of the learned dictionary, existing DL approaches for pattern classification can be divided into three categories: synthesis dictionary learning (SDL), analysis dictionary learning (ADL) and dictionary pair learning (DPL). In SDL, the dictionary is employed to represent the input data as a linear superposition of atoms. ADL aims to yield the sparse representation by exploiting the dictionary as a transformation matrix. DPL, also referred to as analysis-synthesis dictionary learning (ASDL), can jointly learn synthesis dictionary and analysis dictionary. According to whether the dictionary is class-shared or not, SDL can be further divided into three different types, i.e., shared SDL, class-specific SDL and hybrid SDL. Similarly, ADL can be classified into two categories, i.e., shared ADL and class-specific ADL. Fig.  1 presents a taxonomy of dictionary learning approaches for pattern classification.
In class-specific SDL, sub-dictionary for each class is independently learned, then all the sub-dictionaries are concatenated to form the final dictionary. Ramirez et al. [5] presented a dictionary learning with structured incoherence (DLSI) method by imposing incoherence constraint on subdictionaries so as to encourage dictionaries correspond to different classes to be as independent as possible. Yang et al. [6] proposed a metaface learning (MFL) algorithm which learns a set of metafaces for each class. Yang et al. [7] developed a Fisher discrimination dictionary learning (FDDL) method which imposes the Fisher discrimination criterion on the coding coefficients to learn class-specific sub-dictionaries. By considering the fact that different training samples con- tribute unequally to the dictionary, Liu et al. [8] proposed a class specific dictionary learning (CSDL) approach. Akhtar et al. [9] developed a joint discriminative Bayesian dictionary and classifier learning (JBDC) approach which associates the dictionary atoms with the class labels using Bernoulli distributions. By employing the directions of coefficients to promote the discriminative capability of representation, Wang et al. [10] presented a unidirectional representation dictionary learning (URDL) algorithm. Ling et al. [11] proposed a class-oriented discriminative DL (CODDL) method, in which the class-specific sub-dictionaries are learned in a classwise fashion.
In shared SDL, a universal dictionary shared by all classes is learned. The most classic SDL approach is the K-SVD algorithm [12] which has been successfully applied to image compression and denoising. However, KSVD mainly focuses on the representational ability of the dictionary without considering its capability for classification. To address this problem, Zhang et al. [13] proposed a discriminative K-SVD (D-KSVD) method by introducing the classification error into the framework of K-SVD. Jiang et al. [14] further incorporated a label consistency constraint into K-SVD and presented a label consistent K-SVD (LC-KSVD) algorithm. The 0 -norm sparse regularization term is used in LC-KSVD, which is difficult to find the optimum sparse solution. To overcome this limitation, Shao et al. [15] explored a label embedded dictionary learning (LEDL) method which utilizes the 1 -norm as the sparse regularization term. By jointly learning a multi-class support vector machine (SVM) classifier, Cai et al. [16] developed a support vector guided dictionary learning (SVGDL) model. Zhang et al. [17] designed class relatedness oriented discriminative DL (CRO-DDL) method which utilizes the 1,∞ norm constraint [18] on the coding coefficient matrix. By integrating multiple classifiers training into dictionary learning process, Quan et al. [19] presented a multiple classifiers based dictionary learning (MCDL) method. Dong et al. [20] proposed an orthonormal DL method by exerting an orthonormal constraint on the learned dictionary to enforce the dictionary atoms to be as dissimilar as possible. Min et al. [21] constructed a Laplacian regularized locality-constrained coding (LapLLC) algorithm for image classification, in which the similarity matrix is defined on the training data. To fully exploit the locality and label information of the learned dictionary, Li et al. [22] constructed a locality-constrained and label embedding dictionary learning (LCLE-DL) algorithm. Song et al. [23] presented a class-wise discriminative DL (CW-DDL) method which introduces a label-aware constraint and graph regularization into the framework of SDL. By employing the profiles (row vectors of coding coefficient matrix) to construct discriminative terms in SDL, Li et al. [24] proposed an interactively constrained discriminative DL (IC-DDL) algorithm for image classification.
In hybrid SDL, a dictionary that contains several classspecific sub-dictionaries and a shared dictionary is learned. Kong et al. [25] proposed a DL approach dubbed DL-COPAR which explicitly learns the shared patterns (the commonality) and the class-specific dictionaries (the particularity). Gao et al. [26] developed a category-specific and shared dictionary learning (CSDL) method for fine-grained image categorization. Sun et al. [27] presented a discriminative group sparse dictionary learning (DGSDL) model which learns a class-specific sub-dictionary for each class as well as a common sub-dictionary shared by all classes. By introducing a cross-label suppression constraint and group regularization term into the framework of SDL, Wang et al. [28] designed a cross-label suppression discriminative DL (CLS-DDL) approach. Lin et al. [29] proposed a robust, discriminative and comprehensive dictionary learning (RD-CDL) model which learns a class-shared dictionary, classspecific dictionaries and a disturbance dictionary to represent the commonality, particularity and disturbance components in the data. In order to tackle corrupted samples, Vu et al. [30] developed a low-rank shared dictionary learning (LRSDL) framework which simultaneously learns a set of common patterns and class-specific features for classification. By integrating the low-rank matrix recovery technique with the class-specific and class-shared dictionary learning, Rong et al. [31] explored a low-rank double dictionary learning (LRD 2 L) approach. Du et al. [32] proposed a low-rank graph preserving discriminative dictionary learning (LRGPDDL) method which incorporates the low-rank constraint on the class-specific dictionaries, graph preserving criterion and the dictionary incoherence term into the framework of SDL. Readers can refer to [33] for a survey of SDL approaches.
Recently, ADL has received increasing attention due to its efficacy and efficiency, and shared ADL has been widely studied. Rubinstein et al. [34] presented analysis K-SVD which is parallel to the synthesis K-SVD [12]. Afterwards, Shekhar et al. [35] applied ADL to image classification tasks and obtained comparable or better recognition performance than conventional SDL models. To enhance the classification performance of ADL, Guo et al. [36] proposed discriminative ADL (DADL) method. By introducing a synthesislinear-classifier-based error term into the basic ADL model, Wang et al. [37] presented a synthesis linear classifier based ADL (SLC-ADL) algorithm. By solving a joint learning of ADL and a linear classifier through K-SVD based technique, Wang et al. [38] designed a synthesis K-SVD based ADL (SK-SVDADL) method. Similar to LC-KSVD [14], Tang et al. [39] incorporated the label consistency term and classification error term into the framework of ADL and developed a structured ADL (SADL) approach. Maggu et al. [40] proposed label consistent transform learning (LCTL) for hyperspectral image classification. In essence, transform learning and ADL have similar formulation. For class-specific ADL, Wang et al. [41] proposed a class-aware ADL model which learns a discriminative analysis sub-dictionary for each class.
In DPL, a pair of synthesis dictionary and analysis dictionary is learned from the input data. Gu et al. [42] presented a projective dictionary pair learning (PDPL) framework which jointly learns a synthesis dictionary and an analysis dictionary. To further enhance the discriminative ability of DPL, Chen et al. [43] developed a discriminative DL approach called DPL-SV which introduces a differentiable support vector discriminative term into the DPL model. DPL does not impose sparse constraint on the representation matrix, which may lose discriminative power of sparse property. To alleviate this problem, Zhang et al. [44] designed a joint label consistent embedding and dictionary learning (JEDL) model which explicitly exploit a sparse constraint on the representation matrix. To preserve the locality property of learned atoms in the synthesis dictionary, Zhang et al. [45] proposed a locality constrained projective dictionary learning (LC-PDL) method. By jointly learning a classifier with the dictionary pair, Yang et al. [46] explored a discriminative analysis-synthesis dictionary learning (DASDL) model. To preserve the local geometry structure of input data, Chang et al. [47] presented a graph-regularized discriminative analysis-synthesis dictionary pair learning (GDASDL) model to enhance the classification performance of DASDL. To integrate structured dictionary learning, analysis representation and analysis classifier training into a unified framework, Zhang et al. [48] proposed an analysis discriminative dictio-nary learning (ADDL) algorithm. Inspired by the superiority of 1,∞ norm [18], Wei et al. [49] developed a fast DDL (FaDDL) method for synthetic aperture radar (SAR) image classification. The ordinal locality of analysis dictionary is not fully exploited in the above DPL and its variant, to tackle this problem, Li et al. [50] proposed a discriminative low-rank analysis-synthesis dictionary learning (LR-ASDL) algorithm with the adaptively ordinal locality.
In addition to the above DL approaches, to deal with multiview data, some multi-view DL methods have been presented recently. Wu et al. [51] offered a multi-view low-rank dictionary learning (MLDL) method for image classification. Wu et al. [52] proposed a multi-view discriminant dictionary learning via learning view-specific and shared structured dictionaries (MDVSD) for image classification, in which a structured dictionary shared by all views and multiple viewspecific structured dictionaries are simultaneously learned. Ma et al. [53] developed a multi-view coupled dictionary pair learning (MVCDL) framework for person re-identification. Wu et al. [54] presented a multi-view synthesis and analysis dictionaries learning (MSADL) approach for pattern classification.
However, ADL often requires enormous atoms to achieve satisfactory results when applied to pattern classification. For hybrid SDL, how to choose the optimal number of shared atoms remains unresolved. Moreover, the optimization process of class-specific SDL is time-consuming, especially when the number of classes is large. In this paper, we propose a locality constraint dictionary learning with support vector discriminative term (LCDL-SV) for pattern classification, which belongs to the shared SDL category. A support vector discriminative term is introduced to promote the discrimination of coding coefficients. Since the original training data may contain noise or outliers, graph Laplacian matrix constructed on the original training samples cannot faithfully describe the manifold structure. To alleviate this problem, we employ a locality constraint on atoms. The atoms are updated in the dictionary learning procedure, thus the graph Laplacian matrix defined on the atoms is also updated. More importantly, to further enhance the classification performance of our proposed method, the regularized residual and the learned multi-class SVM classifier are jointly exploited to classify the test data. The flowchart of our proposed method for classification is illustrated in Fig. 2. Firstly, features are extracted from the original training and test samples, respectively. Then the training data is fed into our proposed dictionary learning algorithm, when the dictionary learning process is completed, a compact dictionary and multi-class SVM are obtained. Finally, the test data is classified based on the learned multi-class SVM and the regularized residual. Our main contributions are summarized as follows, • A locality constraint of atoms is introduced in our approach, and this term can intrinsically inherit the manifold structure of training data. • In addition to the learned SVM classifier, we take the regularized residual into account to further promote the VOLUME 4, 2016 classification performance. • The resulting problem is solved elegantly by employing an alternative optimization technique. The remainder of this paper is structured as follows. Section II reviews related work on SDL. In Section III, we present our proposed approach, and detailed optimization procedures are given in Section IV. Section V reports experimental results on five benchmark datasets. Finally, conclusions are drawn in Section VI.

Training data
The learned dictionary

II. RELATED WORK
In this section, we will briefly review some related work, including the basic K-SVD [12] and its two discriminative variants, i.e., D-KSVD [13] and LC-KSVD [14]. Additionally, support vector guided dictionary learning (SVGDL) method [16] is also introduced. To begin with, we first give an introduction to the notations used throughout this paper. Let X = [X 1 , X 2 , . . . , X C ] = [x 1 , x 2 , . . . , x n ] ∈ R m×n be the data matrix of n training samples belonging to C classes, where m is the dimension of vectorized data and n is the total number of training samples, R m×K is the learned dictionary which has K atoms, Z = [z 1 , z 2 , . . . , z n ] ∈ R K×n is the coding coefficients matrix of X on the dictionary D.

A. K-SVD AND ITS DISCRIMINATIVE VARIANTS
By generalizing the K-means clustering process, Aharon et al. [12] developed K-SVD to learn an overcomplete dictionary that best suits given data. The objective function of K-SVD is formulated as follows, where D ∈ R m×K is the dictionary that is to be learned, Z ∈ R K×n is the coding coefficient matrix, and T 0 is a given sparsity level.
(1) can be solved by alternatively updating D and Z. Although K-SVD yields impressive results in image compression and denoising, it is not tailored for classification. To make K-SVD suitable for classification problems, Zhang et al. [13] proposed D-KSVD algorithm by introducing the classification error term into the framework of K-SVD, is the label vector of x i , and W is the parameters for a linear classifier. As can be seen from (2), dictionary and a linear classifier are jointly learned in D-KSVD. Afterwards, Jiang et al. [14] presented LC-KSVD by solving the following optimization problem, where Q = [q 1 , q 2 , . . . , q n ] ∈ R K×n is an ideal representation matrix and A is a linear transformation matrix.

B. SVGDL
To promote the discriminative ability of coding vectors, Cai et al. [16] introduced a multi-class SVM regularization term into the framework of SDL. The regularization term is defined as follows, where u c is the normal vector associated with the c-th class hyperplane of SVM, b c is the corresponding bias, and y c = [y c 1 , y c 2 , . . . , y c n ] is defined as y c i = 1 if class labels y i = c and otherwise y c is the hinge loss function, and θ is a penalty parameter.
The objective function of SVGDL is formulated as follows,

III. PROPOSED METHOD
In this section, our proposed LCDL-SV is presented. First we will introduce a locality constraint on the atoms of the learned dictionary. Then by incorporating the locality constraint and the support vector discriminative term into the framework of SDL, we will present the formulations of our proposed LCDL-SV.

A. LOCALITY CONSTRAINT ON ATOMS
As mentioned earlier, Z is the coding coefficient matrix of training data X over the dictionary D, and z i = [z 1,i , z 2,i , . . . , z K,i ] T , (i = 1, 2, . . . , n) is the coding vector of x i on D. The input training data can be represented as a linear combination of atoms in the dictionary, and the formulation is illustrated in Fig. 3. In [55] and [22], the j-th row vector of Z is coined the profile of atom d j . Thus,ẑ j = [z j,1 , z j,2 , . . . , z j,n ] T (j = 1, 2, . . . , K) is the profile of atom d j , and the red rectangle in Fig. 3 depicts profileẑ j . So the profile matrix can be constructed as Z T = [ẑ 1 ,ẑ 2 , . . . ,ẑ K ] ∈ R n×K , which is the transpose matrix of Z. Based on the definition of profile, the linear representations in Fig. 3 can be reformulated as follows, From (6), one can see that the profileẑ j and atom d j have a one-to-one correspondence. In this paper, instead of preserving locality information of the original training data, we introduce a locality constraint on the atoms of the learned dictionary, which has proven to be more effective and robust [22]. A viable way to encourage similar atoms to have similar profiles is to minimize the following problem, where M is a similarity matrix which can be defined as, where kNN(d i ) represents the k-nearest neighbors of atom d i and δ is a parameter. After some deductions, we can obtain the following equivalent formulation of (7), where L = T − M is a graph Laplacian matrix, T = diag(t 1 , . . . , t K ) and t i = K j=1 M ij . We can observe that the graph Laplacian matrix L is defined on the learned dictionary D. As a result, the graph Laplacian matrix L is updated due to the fact that D is updated in the dictionary learning process. Therefore, the graph Laplacian matrix L can inherit the manifold structure of the training samples.

B. LCDL-SV MODEL
Apart from the locality constraint on the atoms, to facilitate the subsequent classification stage, a support vector discriminative term is incorporated into our proposed method. The purpose of this term is to enforce the coefficients of different classes to be separated by a max-margin. Intuitively, when the coefficients are separated by a hyperplane, the large margin of different classes can promote the confidence of classification. Moreover, the parameters of SVM (i.e., U and b) can be learned in our dictionary learning process. The support vector discriminative term has the same formulation in (4). Thus, the objective function of our proposed LCDL-SV is formulated as follows, where λ 1 and λ 2 are two balancing parameters.

IV. OPTIMIZATION
In this section, we adopt an alternative strategy to solve the LCDL-SV model. The alternative minimization scheme can be partitioned into the following three sub-problems, Update Z: Fix the other variables and update Z by solving the following problem: The optimization of Z in (11) can be performed by columns, which is formulated as,

VOLUME 4, 2016
To facilitate the optimization process, we employ the quadratic hinge loss function to approximate the original one. The quadratic hinge loss function is defined as, where t denotes the iteration number. When t=1, (12) is degenerated into the following problem, (14) has the following closed-form solution, When t ≥2, (12) can be rewritten as, also has closed-form solution which is given by, where  We can see that (18) becomes a least squares problem with quadratic constraints. Here we employ the Lagrange dual function [56] to solve (18), and the Lagrange dual function of (18) is formulated as, where δ = [δ 1 , δ 2 , . . . , δ K ] and δ k is the Lagrange multiplier corresponds to the k-th equality constraint ( d k 2 − 1 = 0).
We can define a diagonal matrix ∆ whose diagonal element ∆ kk = δ k , then (19) can be reformulated as, By setting the first-order derivative of (20) to zero, we can obtain the following solution to D, To speed up the optimization process, we discard ∆ in the final formulation, which is given by, When D is updated, we update the graph Laplacian matrix L by using (8).
Update U and b: When the other variables are fixed, (10) with respect to U and b is boiled down to the following problem, (23) is a multi-class linear SVM problem which can be solved by the SVM solver presented in [57]. Due to the fact that the objective function proposed in (10) is non-convex, the algorithm cannot converge to the global minimum. However, satisfactory solutions can be obtained with the decreasing of the objective function. The convergence curve of LCDL-SV on the Extended Yale B database is plotted in Fig. 4. Algorithm 1 outlines the optimization process of our proposed LCDL-SV. Construct the graph Laplacian matrix L by using (8) 4: for i=1 to n do 5: Update z i by using (15) and (17) 6: end for 7: Update the dictionary by using (22) 8: for c=1 to C do 9: Update U c and b c by solving (23) 10: end for 11: end while Output: D, U and b When the dictionary learning process is completed, we perform classification as follows. For a test sample x new , first we obtain its coding vector by z = Px new , where Then the regularized residual for the c-th class can be obtained by, where D c and z c are the sub-dictionary and coding vector associated with the c-th class, respectively. Moreover, the result produced by the learned SVM classifier is formulated as, Finally, the identity of x new is given by, where η 2 is a weighting parameter.

V. EXPERIMENTAL RESULTS
In this section, we conduct experiments on five publicly available databases, i.e., the Extended Yale B database [58], AR database [59], Scene 15 dataset [60], Caltech 101 dataset [61] and LFW database [62]. We compare LCDL-SV with SRC [63], D-KSVD [13], LC-KSVD [14], FDDL [7], SVGDL [16] and two recently proposed ADL approaches, i.e., CADL [41] and SADL [39]. To validate the effectiveness of employing both the regularized residual and SVM, we also report the results of LCDL-SV only using regularized residual for classification and LCDL-SV only employing the learned multiclass SVM for classification, which are denoted by LCDL-SV (Res) and LCDL-SV (SVM), respectively. Besides the classification accuracy, we also record the training time and testing time of these competing methods in our experiments. SRC directly employs all the training data as the dictionary, thus we do not report its training time. The difference be- There are five parameters in our proposed method, i.e., θ, λ 1 , λ 2 , η 1 and η 2 . In all experiments, θ is set to be 0.2, the other four parameters are determined by cross-validation, and λ 1 and λ 2 are selected from 10 −6 , 10 −5 , . . . , 10 −1 . The optimal values on each dataset are recorded in Table 1. For fair comparison, we tune the parameters of competing approaches to achieve their best performance.

A. EXTENDED YALE B
The Extended Yale B database contains 2414 frontal face images of 38 subjects, each person has about 64 images, and some example images are shown in Fig. 5. Following the experimental setting in [16], in our experiments, all images are cropped to 54×48, then they are reduced to a dimension of 300 by PCA. We randomly select 20 images per person as training set and the remaining as testing set. The dictionary has 380 atoms, which corresponds to an average of 10 atoms per subject. Experimental results are summarized in Table  2. We can observe that the proposed LCDL-SV achieves the highest recognition accuracy. Moreover, LCDL-SV is much faster than FDDL in the training phase, and the training time of LCDL-SV is comparable to that of SVGDL. Thanks to the framework of ADL, CADL and SADL are efficient on this dataset. Due to the jointly learning dictionary and multi-class SVM, the testing time of SVGDL and LCDL-SV (SVM) is less than our proposed LCDL-SV. Nevertheless, by fusing the regularized residual and the learned multi-class SVM, LCDL-SV outperforms all the competing approaches in recognition accuracy, and it is more efficient than SRC and FDDL in terms of testing time. It should be noted that, in [39], the reported accuracy of SADL and SRC is 96.35% and 96.51%, respectively. The differences lie in the following two aspects. On the one hand, in [39], each image of 192×168 pixels is projected onto a 504-dimensional vector by random projection, while we use the cropped image of 54×48 pixels and employ PCA to reduce the image to a dimension of 300. On the other hand, half of the images (i.e., 32 images) per subject are used for training in [39], while 20 images per person are employed for training in our experiments.

B. AR
The AR database has more than 4000 face images of 126 subjects with variations in facial expression, illumination conditions and occlusions. Fig. 6 shows example images from the database. In our experiments, we use a subset of 2600 images of 50 male and 50 female subjects from the database. As in [14], each 165×120 face image is projected onto a 540-dimensional vector by random projection. For each person, 20 images are randomly selected for training and the remaining for testing. The learned dictionary has 500 atoms, namely five atoms per person. Table 3 lists the recognition accuracy and computing time of all compared methods. Notice that 20 atoms per class are exploited in [41], while only five atoms per class are used in our experiments.
One can see that LCDL-SV has the best performance in recognition accuracy and is more efficient than FDDL in both training and testing phases. Moreover, on this database and the Extended Yale B database, LCDL-SV (SVM) achieves better results than SVGDL, which demonstrates that locality constraint of atoms does promote the classification performance of SDL approaches on these two face databases.

C. SCENE 15
Scene 15 dataset contains 15 natural scene categories, which comprises a wide range of indoor and outdoor scenes, such as bedroom, office and mountain, example images from this dataset are shown in Figure 7. For fair comparison, we employ the 3000-dimensional SIFT-based features used in LC-KSVD [14]. Following the common experimental settings, we randomly select 100 images per category as training data and use the remaining for testing. The learned dictionary has 450 atoms. Experimental results are shown in Table 4. The recognition accuracy of LCDL-SV is 99.0%, which outperforms all the compared methods. CADL performs the second best on this dataset, followed by SADL and SVGDL. Moreover, LCDL-SV is 15 times faster than FDDL in the training stage. The confusion matrix for LCDL-SV is depicted in Fig.  8, in which diagonal elements are well-marked. It can be seen that LCDL-SV attains 100% recognition accuracy for the categories of suburb, forest and inside-city.

D. CALTECH 101
Caltech101 dataset is a widely used dataset for object classification with 102 classes (i.e., 101 object classes and one background class). The number of images in each category is unbalanced, varying from 31 to 800, and in total this dataset contains 9144 images. For fair comparison, we also employ the 3000-dimensional SIFT-based features used in LC-KSVD [14]. Following the common experimental protocol, we randomly choose 5, 10, 15, 20, 25, and 30 samples per category for training and test on the remaining images. We repeat this process 10 times with different splits of training and test images and record the averaged classification accuracy. Table  5 summarizes the classification results and Table 6 lists the training time and testing time when 5 samples per category are used for training. As can be seen from Table 5, LCDL-SV consistently outperforms the other competing approaches in all cases. Compared with SVGDL, LCDL-SV (SVM) does not always achieve better accuracy (e.g., when the number of training samples per category is 25). This indicates that using only the locality constraint cannot guarantee the best performance. Combining with the proposed classification scheme, LCDL-SV exhibits its advantage over other dictionary learning approaches. From Table 6, we can observe that the training time of LCDL-SV is only one-sixteenth of that of FDDL and LCDL-SV is faster than SRC in the testing phase.

E. DEEP FEATURES
In this subsection, a subset of LFW database is used to evaluate our proposed LCDL-SV and other competing approaches on deep features. This subset contains 1251 images of 86 subjects, each person has 11-20 images. All images are converted to grayscale images and cropped and resized to 32×32, some example images are shown in Fig. 10. Five images per subject are randomly selected as training samples and the remaining as test samples. The pre-trained VGG16 and VGG19 [64] models are employed to extract deep features, and FC6 in

F. PARAMETER SENSITIVITY ANALYSIS
As mentioned earlier, five parameters should be determined in our proposed method, i.e., θ, λ 1 , λ 2 , η 1 and η 2 . For the two parameters λ 1 and λ 2 , we find that relatively small values (e.g., 1e-5) can guarantee our proposed method to achieve satisfactory results for pattern classification tasks. η 1 is used to obtain the coding coefficients of test samples and it is usually set to 1e-3. For diverse datasets, η 2 has a relatively wide range in fusing the regularized residual and the results of multi-class SVM. Therefore, the above VOLUME 4, 2016 observations can be treated as a rule of thumb for selecting parameters of the proposed LCDL-SV. To investigate the sensitivity of parameters, we carry out experiments on the Extended Yale B database, and the experimental settings are the same as that in Section V-A. When analyzing one parameter, we fix the other two parameters. Firstly, we fix the parameters λ 2 and η 2 , and examine how the performance changes with varying λ 1 . Fig. 11 (a) plots the recognition accuracy with varying λ 1 . Similarly, Figs. 11 (b) and 11 (c) plot the results of varying λ 2 and η 2 . As can be seen from Fig.  11 (a), when the value of λ 1 increases from 10 −5 to 10 −3 , the accuracy of LCDL-SV is gradually increasing. However, the performance of LCDL-SV will degrade when the value of λ 1 is larger than 0.01. From Fig. 11 (b), we can see that LCDL-SV achieves stable performance when the value of λ 2 is in the range of [10 −7 ,10 −4 ]. With the increasing of λ 2 , the performance drops to some extent. A larger λ 2 will reduce the discriminative ability of the support vector term, leading to degenerated performance. From Fig. 11 (c), one can see that accuracy of LCDL-SV has an increase with η 2 from 1 to 5, and then has a decline when η 2 continues increasing. On the Extended Yale B database, LCDL-SV has the best performance when η 2 is set to be 5.

VI. CONCLUSION
In this paper, we propose a locality constraint dictionary learning with support vector discriminative term (LCDL-SV) for pattern classification. In contrast with traditional methods in which the graph Laplacian matrix is derived from the original training data, we preserve the locality of atoms on the basis of the learned dictionary. By introducing a support vector discriminative term into the formulation of LCDL-SV, a classifier can be jointly learned in our dictionary learning procedures. More importantly, the regularized residual and multi-class SVM are simultaneously employed to classify the test samples. Experimental results on face databases, scene dataset and object dataset validate the effectiveness of LCDL-SV, and it outperforms some state-of-the-art dictionary learning approaches, e.g., FDDL, SVGDL, CADL, and SADL. Discriminative analysis dictionary learning methods have aroused considerable interest due to their efficiency and efficacy. In future work, we will develop new discriminative ADL approach and apply it to other classification scenarios, such as action recognition and texture classification.