Maximum Margin Correlation Filter Based on Data Spatial Distribution Information

In the past decade, object localization and object classification using correlation filters, especially large margin correlation filters which combine with support vector machine (SVM), have become a hotspot. However, the large margin correlation filters do not consider the class distribution and the structural features within the class during training, which is easily affected by noise. This paper presents two methods to overcome this drawback: minimum class variance large margin correlation filter (MCVLMCF) and minimum class locality preserving variance correlation filter (MCLPVCF). First, the overall structure information of the target is obtained by the within-class scatter with MCVLMCF, and the spatial features of the sample are extracted by the intrinsic manifold structure of data with MCLPVCF. Then, we embed these two types of information into the optimal function of the large margin correlation filter, fuse the sample spatial features with the large margin principle and correlation filtering, and convert it to solve the filter in the frequency domain. Finally, object localization experiments in actual environments and classification experiments on different datasets demonstrate that our proposed methods can adapt to complex object changes and achieve better performances than some state-of-the-art methods.


I. INTRODUCTION
Research on computer vision algorithms has been the focus of much activity in computer science. It covers a variety of application areas and academic disciplines, including target classification, target detection, digital image processing, geometric modeling, physics, and mathematics. Among these, target classification and target detection are significant computer vision research and application areas. They are frequently utilized in our lives for vehicle detection [1], pedestrian detection [2], object recognition [3], and face recognition [4].
In the past years, correlation filters have been investigated for target classification, detection, and tracking [5] because of The associate editor coordinating the review of this manuscript and approving it for publication was Wenjie Feng. their noise immunity, shift-invariance, and fast training. Correlation filters achieved good performance in many pattern recognition applications including face localization, pedestrian localization, and object tracking [5]. Many computer vision methods have been combined with correlation filters. Vander et al. [6] propose matched filter (MF) which is an earlier correlation filter algorithm. The construction method of the filter is the conjugate transpose after the twodimensional Fourier transform of the training image. The structure of the MF is too simple, only the correlation output of training images is relatively accurate, and it is not practical enough, but the application of the correlation filter in image processing also provides new ideas for subsequent researchers. After a long period of improvement, the current correlation filter can be divided into two kinds: the synthetic VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ discriminant function filter and the optimal correlation output filter. When building the filter, the synthetic discriminant function constrains the output responses of the positive and negative samples to 1 and 0, respectively. The discriminant function is then used to assess whether the sample is positive or negative. Casacent et al. [7] propose the original synthetic discriminant function filter (SDF) which is the earliest proposed synthetic discriminant function filter, it sets a part of the positive data in the training set as high output values, sets other samples to low output values. However, it is essentially the weighted sum of multiple MFs, and the actual effect does not significantly exceed MF. Kumar et al. [8] propose the minimum variance synthetic discriminant function filter (MVSDF) which improved on SDF. MVSDF minimizes the correlation output variance when constructing the filter and adds constraints on noise output, suppresses noise and emphasizes low-frequency information, but it also suppresses peak output so that the correlation output is insufficiently noticeable. Mahalanobis et al. [9] propose the minimum average correlation energy (MACE) and the unconstrained minimum average correlation energy (UMACE) in literature [9], which increase the constraint on the correlation output, enhance the correlation output, but reinforce the correlation output to noise, and cause the anti-noise ability is insufficient, the output peak is drowned out in the noise output. Mahalanobis et al. [9] propose the maximum average correlation height (MACH) which simultaneously minimizes the mean square error of the noise and targets to construct a peak output with a high correlation height. Mahalanobis et al. [10] propose the unconstrained optimal trade-off synthetic discriminant function (UOTSDF) by fusing MVSDF and MACE and using a balancing parameter to combine the two filters. Kumar et al. [11], [12] propose minimum squared error synthetic discriminant functions (MSESDF) and Optimal Trade-off Synthetic Discriminant Functions (OTSDF) by utilizing MVSDF and MACE. Their correlation output is emphasized and the noise is suppressed, but performance is also significantly affected by parameter selection. Another kind of correlation filter is the optimal correlation filter. This kind of correlation filter constructs a strong peak at the target or center position of the training images, and then learns a mapping from the training images to the desired output to calculate the filter. The average synthetic exact filter (ASEF), proposed by Bolme et al. [13], uses the construction of the target response for each training image and then averages it to obtain the final filter, and it obtains a weak filter for each image first and then merges it to find the strong filter. This algorithm performs well in the experiments of human eye localization and face recognition, because the deformation of these kinds of image data is generally small. However, the averaging approach loses a large amount of useful image information, and the performance is unsatisfactory when the image is more variable and the number of samples is small. Bolme et al. [14] propose the minimum output sum of squared error (MOSSE), which is constructed by solving the sum of squares that minimizes the true correlation output and the desired correlation output of the training samples. MOSSE uses fewer training samples for training than ASEF, and it requires fewer demands on the training set, which speeds up the training process compared to ASEF. Most of the correlation filters are improved based on this algorithm.
Subsequently, Rodriguez et al. [15] changed the idea of improving the correlation filter, combined the correlation filter with support vector machine (SVM) [16], and propose the maximum margin correlation filter (MMCF). SVM is a classical supervised learning classifier that is widely used in pattern recognition tasks such as face recognition [17], and pedestrian detection [18]. MMCF maximizes the classification margin and minimizes the mean square error while utilizing SVM's generalization capabilities and correlation filters' classification capabilities. Thus, the obtained filter performs better than the previous traditional correlation filter algorithm in target detection and target recognition experiments. To improve MMCF, Boddeti et al. [19] propose the maximum margin vector correlation filter (MMVCF), and Fernandez et al. [20] propose the zero-aliasing maximum margin correlation filter (ZAMMCF). Compared with MMCF, MMVCF uses vector features, and ZAMMCF introduces a new zero aliasing constraint to eliminate the aliasing problem of correlation filters.
Although compared to other correlation filters, MMCF and its subsequently improved algorithms have achieved excellent performance, these methods still have some shortcomings. First, these algorithms do not consider the overall distribution information of the samples when constructing the filters, and only apply the sample points on the boundary; Second, they ignore the fact that the structural information between samples is also an important factor in filter construction. These drawbacks make the filter susceptible to noise during training, which can easily be incorrectly classified when the non-targets are more similar to the targets. To address the problems, we propose two new methods. For the first problem, we propose the minimum class variance large margin correlation filter (MCVLMCF) using fisher linear discrimination [21], [22] and SVM. For the second problem, by using locality preserving projections [23], [24], [25] (LPP) and SVM, we propose the minimum class locality preserving variance correlation filter (MCLPVCF). The main contributions of our work can be summarized as follows: (1) We introduce the within-class scatter into the large margin correlation filter, so that the algorithm can take into account the boundary samples and the overall distribution structure of the samples during training. (2) Then, using the sample manifold structure, the local preserving scatter is combined with a large margin correlation filter, and the internal structure information of the sample is fully considered.
(3) Finally, we investigate the influence of parameters on the proposed algorithms through some experiments, and compare the proposed algorithms with other algorithms through object localization and object classification experiments.

II. MAXIMUM MARGIN CORRELATION FILTER
Before the proposal of MMCF, the improvement method of the traditional correlation filter was based on the optimization of the correlation output, and did not consider combining other classifiers to improve its discrimination ability. Rodriguez et al. [15] extend the correlation filter and fuse SVM with the correlation filter. MMCF combines the maximum margin principle of SVM with correlation filter, and uses the excellent classification and generalization capabilities of SVM to further improve the performance of the filter.

A. CORRELATION FILTER
Many correlation filter designs can be interpreted as optimizing a distance metric between an ideal desired correlation output for an input image and the correlation output of the training images with the filter template [26]. Given a dataset U = {x i |i = 1, . . . , N } ⊂ R d×N , the definition of the correlation filter is as follows: (1) In Eq.1, w is the filter vector, x i is the input image vector, and g i is the ideal desired output vector. The definitions of g i for different filters are not exactly the same, but basically the center position has a high value, and the rest of the positions have low values. ⊗ represents the cross-correlation operation of the image.

B. SVM
In image processing, SVM requires feature vectors of the same size for training and prediction, which can lexicographically scan image pixel values to form feature vectors, and then label the feature vectors in the training image [27]. Given a dataset U = {x i | i = 1, . . . , N } ⊂ R d×N , and the sample label y i ∈ {1, −1}, the SVM maximizes the minimum L-2 norm distance (also known as the margin) between the hyperplane and any data sample by solving this hyperplane: In Eq.2, w is the normal vector of the classification hyperplane, T represents transposition, C is a given constant that defines the cost of the errors after the classification, ξ i is the slack variable, and b is the bias or offset from the origin.

C. CORRELATION FILTER THAT INTRODUCE SUPPORT VECTORS
By combining SVM with correlation filter, MMCF not only minimizes the mean squared error, but also simultaneously maximizes the minimum Euclidean distance (margin) between the hyperplane and the data points. The function of MMCF is as follows: In Eq.3, λ ∈ (0, 1] is balance parameter, c i = 1 for trueclass images and c i = 0 for false-class images. MMCF takes w T w + C N i=1 ξ i as the classification criterion and as the localization criterion. Compared with traditional correlation filters, MMCF has significantly improved classification and localization capabilities.

III. CORRELATION FILTER BASED ON MINIMUM WITHIN-CLASS VARIANCE
MMCF, MMVCF, and ZAMMCF introduce the principle of maximum margin into the correlation filter. Compared with traditional correlation filters, these filters perform better in object detection and object recognition experiments. However, the SVM part of MMCF only considers the sample points on the boundary when establishing the classification hyperplane, ignoring the distribution information of various sample data. Thus, the classification hyperplane obtained by SVM is not consistent with the actual situation of the class distribution. Consequently, MMCF applies only boundary sample points during training.

A. MINIMUM CLASS VARIANCE SUPPORT VECTOR MACHINE
In the classification problem, SVM attempts to obtain the classification hyperplane with the maximum margin, where the margin is defined as the minimum distance from the class boundary sample to the hyperplane, which causes the aforementioned problem. Stefanos et al. [28] combined fisher's linear discriminant analysis (FLDA) [29] and SVM, and proposed the minimum within-class variance support vector machine (MCVSVM). Suppose a training sample set U = {x 1 , ..x N } ⊂ R d×N , which belong to two categories {A + , A − }, MCVSVM's optimization problem is defined as: In Eq.4, S ω is the within-class scatter matrix, which is a symmetric positive semi-definite matrix, and is defined as follows: In Eq.5, m A − and m A − are mean sample vectors for the classes A + and A − , respectively. If the input image dimension VOLUME 10, 2022 is d, the size of S ω is d × d. The within-class scatter matrix is used to describe the distribution within the sample data class. The distance from the sample to the hyperplane in the SVM is the Euclidean distance. In contrast, after the within-class scatter is introduced, MCVSVM applies the Mahalanobis distance [29]. Fig.1 shows the classification hyperplane generated by SVM and MCVSVM, in which circles and rectangles represent two classes of artificial data. SVM only takes into consideration the Euclidean distance between the two classes of samples and is affected by the two sample points in the middle of the dataset. The obtained decision hyperplane does not conform to the spatial distribution of the samples. However, MCVSVM introduces the within-class scatter matrix to incorporate the overall spatial distribution information for training, and it does not be disturbed by individual special sample points.
It can be found that the classification hyperplane of MCVSVM is more in line with the overall distribution of the data, and better solves the problem that SVM does not consider sample distribution information. Therefore, MCVSVM is more robust than SVM and has a better generalization performance. Our improvements to the large interval correlation filter are inspired by this approach.

B. MINIMUM CLASS VARIANCE LARGE MARGIN CORRELATION FILTER
To address the above-mentioned problem of the correlation filter, we propose MCVLMCF inspired by [28]. Referring to Eq.3, the MCVLMCF model can be established as follows: In Eq.6, λ is used to balance the proportion of the classification criterion part and the localization criterion part in MCVLMCF, and the value range is (0,1]. For example, when λ =1, it is equivalent to ignoring the positioning standard part, and only MCVSVM is applied. The specific structural form of S ω is shown in Eq.5, which uses the scatter of the data within the class to obtain the distribution information of the data samples.
Inspired by MCVSVM, MCVLMCF introduces the sample scatter within the class, takes full account of the distribution information of the data, and makes the training of MCVLMCF more sufficient, which is not easily affected by the noise. In addition, the performance of the filter can be improved by combining the advantages of the correlation filter and the maximum margin principle of SVM.

C. THE SOLUTION OF MCVLMCF
As shown in Eq.6 that MCVLMCF is a combination of two optimization problems, it is difficult and time-consuming to solve MCVLMCF directly. According to the correlation theorem [30], the cross-correlation operation in the correlation filter can be transformed into multiplication in the frequency domain through the Fourier transform, which can improve the operation speed. Therefore, referring to the solution method in the literature [15], we convert the model to the frequency domain and solve it using the Wolf dual problem.
To facilitate the calculation, we convert the localization and classification criterion of MCVLMCF respectively. The localization criterion is transformed as follows: In Eq.7,X i is a diagonal matrix whose elements on the diagonal are all the elements ofx i .x i ,ŵ,ĝ i are the input image vector x i , filter vector w, and desired output vector g i which are converted to the frequency domain by the Fourier transform, respectively. We ignore 1 d because it is a constant and has no effect on the filter. g i in the frequency domain can be rewritten as: In Eq.8, † represents the conjugate transpose, 1 = [1 · · · 1] T , which is a d -dimensional vector. Substitute Eq.8 into Eq.7, it can be changed as follows: i . In addition, because the dimension d of the imagex i is generally very large, borrowing from the processing method in the literature [15], to reduce the computational complexity, We can formulate the MCVSVM in the frequency domain using the fact that inner products are only scaled by 1 d [31]. Thus, Eq.4 can be rewritten as follows: , and the definition of this matrix is similar to the previous definition.
We can now express the MMCF multi-criteria shown in Eq. 6 in the frequency domain as follows: Eq.11 can be solved by the Lagrange dual problem L ŵ, b , ξ, α i , β i : In Eq.12, α i and β i are the Lagrange multipliers. The Karush-Kuhn-Tucker (KKT) conditions imply that, for Eq.12, the following holds: By substituting the results in Eq.13 into Eq.12, the dual problem of MCVLMCF can be obtained: In Eq.14, α = [α 1 , . . . , α N ] T . Eq.14 can be solved by the sequential minimal optimization (SMO) [32]. SMO solves all Lagrange multipliers by heuristically searching for a pair of α i , instead of solving the entire α vector at once. SMO breaks this large quadratic programming (QP) optimization problem into a series of the smallest possible QP problems so that it can be solved faster. The Lagrange multiplier vector α * of the dual problem is solved, and the filterŵ * of MCVLMCF in the frequency domain can be obtained using the following formula: In Eq.15, Y is a diagonal matrix with y i along the diagonal.

IV. MINIMUM CLASS LOCALLY PRESERVING VARIANCE GUIDED CORRELATION FILTER
When building the classification hyperplane, the SVM part of large margin correlation filters such as MMCF simply takes into account the margin between border samples and ignores the inherent manifold structure of the data space. Aiming at the drawback of this problem, we propose the minimum class locality preserving variance correlation filter (MCLPVCF). MCLPVCF, in contrast to other large margin correlation filters, fuses weighted adjacency graph of samples and introduces local preserving within-class scatter. It gives full consideration to structural information and manifold structure of samples, maximizes classification margin and optimizes correlation output, and then takes into consideration both class and structure information of samples during training.

A. MINIMUM CLASS LOCALITY PRESERVING VARIANCE SUPPORT VECTOR MACHINE
Aiming at the problem of SVM mentioned above, Wang et al. [33] use the locality preserving projections (LPP) [23], [24], [25], and the minimum class locality preserving variance support vector machine (MCLPV_SVM) is proposed in [33]. MCLPV_SVM optimization problem is defined as: In Eq.16, Z w is the locally preserving within-class scatter matrix, which is used to describe the structural information VOLUME 10, 2022 between data using the method of LPP. LPP is a data dimension reduction algorithm based on projection, which builds an adjacency graph containing structural information between the data. The locally preserving within-class scatter matrix is introduced by MCLPV SVM using this adjacency graph, with K = 1, 2 two-class samples, and its specific construction is as follows: In Eq.17, Z K is the locally preserving within-class scatter matrix of the K -th class data. It is constructed as follows: In Eq.18, X K is the input image matrix of the K -th class data, D K is a diagonal matrix whose elements D K ij is constructed as D K ij = j W K ij , W K ij is the element of the W K , W K is the adjacency graph weight matrix. x Ki and x Kj are the data points of the K -th class data, G is the adjacency graph of the dataset D, which denote the local manifold structure, and L = D − W is the Laplacian matrix of G. G has a total of N nodes (that is, data points), if node i is in the k nearest neighbors of node j, then x Ki ∈ N k x Kj , to reflect the structural information between the data, the weight matrix W K is constructed with the Gaussian kernel, and the construction of its elements as follows: In Eq.19, norm, t is the Gaussian kernel parameter. W K models the local structure of the data manifold by determining whether the data points are neighbors and using the Gaussian kernel. Fig.2 describes the decision hyperplanes of SVM and MCLPV_SVM on an artificial dataset. MCLPV_SVM fully considers the connection between data samples, and thus it can be used to improve the large margin correlation filters.

B. MINIMUM CLASS LOCALITY PRESERVING VARIANCE CORRELATION FILTER
The SVM part of the large margin correlation filters uses the maximum margin of the boundary samples to establish the classification hyperplane, but it ignores the distribution information and internal relationships within the class, fails to fully utilize the data information, and the sample training is insufficient. Aiming at the above problem, inspired by the literature [33], we propose MCLPVCF as follows: In Eq.20, Z w is the locally preserving scatter matrix, and its definition is the same as that of Eq.17, Eq.18, and Eq.19, it uses the sample weighted adjacency graph to obtain the local manifold structure information. The parameters t and k can affect the value of Z w , t can change the specific value of the elements in the weight matrix, and k can determine the k sample points around the sample as neighbors, which influences the utilization of the manifold structure of the data.

C. THE SOLUTION OF MCLPVCF
Similar to the solution method in Section III, MCLPVCF is converted first to the frequency domain, then solved by the dual problem, and finally the filter is obtained by the SMO algorithm.
MCLPVCF in the frequency domain is as follows: The dual problem of MCLPVCF is as follows: By solving the dual problem Eq.22 of MCLPVCF, the Lagrange multiplier vector α * is obtained, andŵ * can be obtained using the following formula:

V. EXPERIMENTS
The experiments were divided into three parts. First, we briefly describe the process of the two algorithms. Second, we conducted different kinds of experiments to investigate the influence of these parameters. Finally, we report the results of the two algorithms in terms of target recognition and detection. The test environment was an Intel CPU i7 with 16 GB memory and MATLAB 2020a.

A. ALGORITHM PROCESS
Because the two proposed algorithms are both improved using large margin correlation filters, the algorithm steps are similar. First, the training sample images are normalized to obtain the training sample vectors x i , and Fourier transform is employed to convert x i to the frequency domain to obtain x i . MCVLMCF usesx i , y i to calculate the within-class scatter matrixŜ ω , and MCLPVCF calculates the locally preserving scatter matrixẐ w by Eq.17, Eq.18, and Eq.19. Then, calculate the matrixẐ, substitute the parameters into Eq.14 or Eq.22 and use SMO to obtain α * . Finally, we substitute α * and other parameters into Eq.15 or Eq.23 to solve the filterŵ * . In the experiment, the filter was used to perform cross-correlation operations with the images to obtain the correlation output, and then judge the target position or category through the output. The algorithm process is given as follows.

B. PARAMETER INFLUENCE ON PERFORMANCE OF MCVLMCF
MCVLMCF has two parameters, C and λ, which have an influence on the performance. Therefore, we conducted a fish detection experiment to investigate the influence of these parameters on MCVLMCF. Fish detection is an important research topic in the field of ocean exploration. In this subsection, we test MCVLMCF with different parameters on the labeled fishes in the wild dataset of the NOAA fishery [34], as shown in Fig.3 is a partial image of the dataset. These images were collected through camera equipment deployed on remotely operated submersibles and belong to the dataset collected in the actual environment. We used grid search to find the optimal combination of parameters and some values can be determined empirically. The test results were also compared with other algorithms, including MMVCF [19], ZAMMCF [20], MOSSE [14], OTSDF [12], MMCF [16], and axisymmetric shell intersection-based correlation filter (ASICF) [35]. (3) Substitute the locally preserving scatter matrixẐ w , the matrixẐ, the frequency domain image vectorx i , and parameters C, λ into Eq.22, and employ SMO to solve α * (4) Substitute α * ,Ẑ w ,Ẑ into Eq.23 to obtain the frequency domain filterŵ * 5. Output: Use the cross-correlation operation between the filter and the test image to obtain the correlation output. Obtain the localization result or the classification result by correlation output.
Because the fish in most of the images are not clear, this part of the images was excluded from the experiment.  We randomly selected 80 images from the images containing clear fish as test samples, the remaining images containing clear fish were used as positive samples, and the clear images without fish were used as negative samples. During the training process, owing to the different sizes of the images, the training samples were changed into images of 160 × 160 pixels to obtain a fixed-size filter. In the test, we compared the test results with the ground-truth locations, and obtain the overlap rate with the ground-truth box to calculate the detection accuracy.
In Eq.24, p x , p y , p x , p y are the horizontal and vertical coordinates of the upper left corner of the ground truth and the test detection box, respectively, p w , p h , p w , p h are the width and length of the ground truth and the test detection box, respectively, I is the overlapping area. The function used to calculate the overlap ratio O is as follows: From Eq.24 and Eq.25 that the value of the overlap rate O must be in the range of [0,1]. O reflects the degree of overlap between the detection box and the real box (e.g., when O =0 the test detection box does not overlap with the ground truth). A threshold D is set and compared it with the overlap rate to determine whether the target is detected numerically. Table 1 presents the detection accuracy of MCVLMCF for different parameters and thresholds. The test results show that MCVLMCF can achieve a particular detection accuracy for all parameter combinations. When the value of λ is 0.3 to 0.7, it has little influence on the detection results of MCVLMCF. When λ =1, that is, only the classification standard part is used, the detection accuracy decreases significantly. Table 2 shows the detection accuracy of various algorithms in the fish detection experiment for different λ values at fixed    Table 3 shows a comparison of the detection accuracy of various filters for fixed D = 0.3, λ =0.8, C = 1 (parameters set by MCVLMCF, MMVCF, ZAMMCF, and MMCF). In the fish detection experiment, MCVLMCF outperformed the other algorithms in terms of detection outcomes after applying the data distribution information.
The change curve of the detection accuracy for different methods under various D values with fixed parameters C =1, λ =0.6 is shown in Fig. 4. For all D values, MCVLMCF had a greater detection accuracy.
A fish detection example is shown in Fig. 5, where the red box represents the MCVLMCF detection box, the blue box represents the MMCF detection box, and the yellow box represents the ground truth box. It is evident that the MCVLMCF detection box has a larger overlap with the  ground truth box, indicating that the MCVLMCF detection result is more accurate.
The response of the identical detection image following filtering using MCVLMCF and MMCF is shown in Fig. 6. It is clear from a comparison of the two figures that MCVLMCF is better able to generate the peak response to the target with a higher response, more pronounced, and concentrated sharp  The results in this subsection show that MCVLMCF performs better with the parameters C =1 and λ =0.6, and the parameters C =1 and λ =0.6 were adopted in the following experiments.

C. PARAMETER INFLUENCE ON PERFORMANCE OF MCLPVCF
We conducted a handwritten font classification experiment to examine the influence of MCLPVCF parameters, C, λ, t, and k on the algorithm. We used the MINIST standard handwritten digit dataset [36] created by the National Institute of Standards and Technology for the experiment. A partial image of the MINIST dataset is shown in Fig.7. We randomly selected 400 images from each class as the training set and 600 as the test set, and combined the four parameters in pairs for testing. Fig.8 shows the classification accuracy of MCVPLCF for different parameters λ, t, k. The classification accuracy for fixed parameters k = 30, C = 1, the classification accuracy for fixed parameters t =1, C = 1, and the classification accuracy for fixed parameters λ =0.6, C = 1 are shown in Fig.8 (a), (b), and (c), respectively.
It can be found from Fig.8(a) that the classification accuracy is poor when t and λ are small, and the accuracy stabilizes when these two parameters increase to certain values. The situation in Fig.8(b) is roughly similar to that in Fig.8(a), but the accuracy drops instead when k and λ are large simultaneously. The accuracy diminishes as the value of k increases, as shown in Fig. 8(c), despite the relatively mild influence of k.
Referring to the parameter settings in the literature [33], Table 4 shows the classification accuracy for different parameters k and C, when the fixed parameters λ = 0.6 and t = 1. Overall, the classification accuracy is higher when k =12, 15, 30 and C = 1. Table 5 shows the classification accuracy for different parameters t and C when the fixed parameter λ = 0.6 and k = 30. When t = 1 and 1.5, the classification accuracy is    higher, and the influence of parameter t on the classification accuracy is substantially less than that of parameter k. Table 6 shows the classification accuracy for different parameters λ and C, when the fixed parameter t = 1 and k = 30. It can be found that the classification accuracy is higher when λ is about 0.5 and 0.7.

D. EYE LOCALIZATION
To compare the performance of the proposed algorithm with that of other algorithms, we conducted a human eye localization experiment. We used the Yale face dataset [37], created by Yale University, which contains 165 face images of 15 individuals, each category has 11 images with obvious expressions and lighting changes, and the size is 100× 80. A part of its image is shown in Fig.9. We randomly selected 45 images from it as the test set. In this experiment, the target was considered to be detected when the overlap rate O ≥0.3. Fig.10 shows two examples of MCLPVCF and MCVLMCF in the eye localization experiment, in which the   red solid line box is obtained by MCLPVCF, and the black dotted box is obtained by MCVLMCF.. Table 7 shows the localization accuracy of MCLPVCF, MCVLMCF, and other algorithms. It can be found that the localization accuracies of MCLPVCF and MCVLMCF are at least 2.2% higher than those of the other algorithms.

E. CLASSIFICATION EXPERIMENT
In this subsection, we report the classification experimental results for multiple datasets. We contrasted the test results not only with the aforementioned methods but also with the adaptive manifold filter and spatial correlation feature (AMSCF) [38]. AMSCF is a method for hyperspectral image classification, but it can also process ordinary images. It also takes advantage of the data manifold structure.
In this study, seven datasets were tested. First, we employed three datasets for object classification experiments. The COIL20 dataset [39], created by Columbia University, contains 1440 grayscale images of 20 classes of objects, each class of objects are captured every 5 degrees, and each image is 128 × 128 pixels. Fig.11 shows a partial sample image of this dataset. We randomly selected [20], [25], [30], [35], [40] images of each class as the training set and the remaining images as the test set. The test was repeated 40 times and the average value was used as the classification accuracy. Table 8 lists the classification accuracies of different algorithms on the COIL20 dataset. MCLPVCF achieved most   of the optimal results, and MCLVCF basically achieved suboptimal results. Moreover, MCLPVCF is approximately 0.9%-17% and MCVLMCF is approximately 0.1%-15.7% high than other algorithms at training sample numbers of 25 to 40.
The COIL100 [39] dataset is shown in Fig.12, which is an extension of the COIL20 dataset, and contains 7200 color images of 100 classes of objects. Similarly, we randomly selected [20], [25], [30], [35], [40] images of each class as the training set and the remaining images as the test set. The test was repeated 40 times and the average value was used as the classification accuracy. Table 9 shows the classification accuracies of the different algorithms on the COIL100 dataset. It can be found that there is an improvement of MCLPVCF on the COIL100 dataset compared to other algorithms. Moreover, MCVLMCF can achieve better classification accuracy. The COIL100 dataset has a large number of classes, and while objects seem similar in most images, the two proposed methods perform better  because they make full advantage of the spatial features of the data during training.
The Caltech101 dataset [40] is shown in Fig.13, which is created by the California Institute of Technology, and contains 9144 images of 101 categories. We randomly selected [20], [25], [30], [35], [40] images of each class as the training set and the remaining images as the test set. Table 10 shows the classification accuracies of the different algorithms on the Caltech101 dataset. Most ideal and suboptimal outcomes are accomplished by MCVLMCF and MCLPVCF.
Then, we used four datasets for the face classification experiments. The Yale face dataset is shown in Fig.9, which is mentioned above. We randomly selected [3], [4], [5], [6], [7] images of each class as the training set and the remaining images as the test set. The test was repeated 30 times and the average value was used as the classification accuracy. Table 11 shows the classification accuracy of different algorithms.
The AR face dataset [41] shown in Fig.14 contains 3120 images of 120 people. The images have obvious  changes in lighting expressions, as well as obvious occlusion changes with glasses and scarves. Each image measured 40 × 50 pixels in size. We randomly selected [5], [10], [15], [17], [20] images of each class as the training set and the remaining images as the test set. The test was repeated 40 times and the average value was used as the classification accuracy. Table 12 shows the classification accuracies of the different algorithms on the AR dataset. When the number of samples is large, the average classification accuracy of MCLPVCF and MCVLMCF is enhanced by 2.0%-27.3% and 0.2%-26.1% compared with the other algorithms, respectively.
The GT face dataset [42] shown in Fig.15 contains 750 face images of 50 individuals. These images are color images with small changes in illumination, but large changes in pose and expression. We randomly selected [8], [9], [10], [11], [12] images of each class as the training set and the remaining images as the test set. The test was repeated 40 times and the average value was used as the classification accuracy.   Table 13 lists the classification accuracies of the different algorithms on the GT dataset. It can be found that the average classification accuracy of all algorithms is not high, which is due to the large variation in face pose, expression, and illumination in this dataset. However, MCLPVCF and MCVLMCF still achieve the optimal and suboptimal classification accuracies with larger number of training samples.
The ORL face dataset [43], created by the University of Cambridge, contains 400 images of 50 individuals with obvious illumination and pose changes. Each image has a pixel size of 92 × 112. A part of the image is shown in Fig.16. We randomly selected [3], [4], [5], [6], [7] images of each class as the training set and the remaining images as the test set. The test was repeated 40 times and the average value was used as the classification accuracy. Table 14 shows the classification accuracies of the different algorithms on the ORL dataset. MCLPVCF and MCVLMCF are 0.4%-37.5% and 0.1%-35% higher than other algorithms, respectively.
We also used the ANOVA test to verify whether the accuracies obtained in the above datasets are credible, and the results obtained are significantly less than 0.05. Therefore, we believe that the results obtained are plausible and our proposed algorithms make a significant difference in the accuracy of the different methods. In general, it can be found that in the object classification and face classification experiments, MCLPVCF has achieved most of the best results, MCVLMCF has achieved most of the sub-optimal and some of the best accuracies. Thus, the proposed methods outperformed the other algorithms in terms of the classification performance.

F. COMPUTATIONAL COMPLEXITY
The training time and test time of the two proposed algorithms and MMCF on the Yale, GT, and ORL face datasets are presented in Table 15. The experimental parameters were set to 105 training images and 60 test images of the Yale dataset, and the image dimension was d = 60 × 60; 600 training images, 150 test images of the GT dataset, and the image dimension was d = 50 × 60; 280 training images, 120 test images of the ORL dataset, and the image dimension was d = 60 × 60. The training time in the table is the time used to train a filter template, and the test time is the average time per image. The three methods below solve the QP problem and Fourier transform, which accounts for the majority of their computational costs. However, the two proposed methods require additional computations to compute the within-class scatter matrix for MCVLMCF and the locally preserving scatter matrix for MCLPVCF, which adds to their training time. But their testing time is the same. We believe that it is worthwhile to invest slightly more time in training to achieve greater performance.

VI. CONCLUSION
In this paper, we have described the problems with large margin correlation filters, such as not sufficiently considering the sample distribution and not applying the sample internal structure. To address these problems, we propose two methods, MCVLMCF and MCLPVCF. First, MCVLMCF introduces the within-class scatter into the correlation filter, which takes into account the class distribution of the data while using the principle of large margin, and fully utilizes the sample dispersion and its data information so that the data of the same class can be more compact. At the same time, the excellent generalization performance of the SVM and the localization ability of the correlation filter are used to make the obtained response more in line with the real situation. Then, through the study of weighted adjacency graph, MCLPVCF utilizes the structural information between samples, introduces the locally preserving scatter information of the data, takes the distribution information and intrinsic manifold structure of the samples into full consideration during training, and maximizes the classification interval and optimizes the correlation output. The construction of the filter takes into account the category information and structural information of the sample, which can also improve the detection and classification performance of the algorithm. The experimental results show that MCVLMCF and MCLPVCF perform well in localization and classification experiments. The future work is to optimize the time complexity of MCVLMCF and MCLPVCF to overcome the expensive training time cost. Additionally, we will verify the feasibility of introducing other variants of SVM into the large margin correlation filters. With the vision to apply AI to improve healthcare, he started working on his start-up through which he got selected as an Ignite Scholar at the Stanford Graduate School of Business. He has also worked as an AI-Consultant, a YC-backed start-up CureSkin. He is also working on satellite star tracking with the University of Adelaide. He is also the Founder of Anixr, where the focus is to use AI to reduce animation rendering cost. He is also actively working with an investment banker to use AI in Forex trading. VOLUME 10, 2022