Semi-Supervised Boosting Using Similarity Learning Based on Modular Sparse Representation With Marginal Representation Learning of Graph Structure Self-Adaptive

The purpose of semi-supervised boosting strategy is to improve the classification performance of one given classifier for a large number of unlabeled data. In the semi-supervised boosting strategy, the unlabeled samples are assigned for pseudo labels according to similarities between the labeled samples and the unlabeled samples, and the unlabeled samples with high confidences of pseudo labels are selected as labeled samples at the same time. Good similarities help to assign more appropriate pseudo labels to the unlabeled samples. These selected samples with pseudo labels will be used as the labeled samples to train the new ensemble classifier. Therefore, good and distinguishable similarities learning between unlabeled samples and labeled samples has shown remarkable importance due to its promising performance for semi-supervised boosting strategy. This article presents semi-supervised boosting framework using similarity learning based on modular sparse representation by employing a marginal regression function with probabilistic graphical structure adaptation. In this article, distinguishable regression targets analysis, graph structure adaptation, robust modular sparse representation and semi-supervised boosting learning are seamlessly incorporated into a joint framework. This framework learns marginal regression targets from data rather than exploiting the conventional zero-one matrix that greatly hinders the freedom of regression fitness and degrades the performance of regression results to improve the interclass separation of the learned representation. Meanwhile, a regularization term based on probabilistic connection knowledge is used to construct a graph regularization with adaptive optimization, which improves the intra-class compactness of the learned representation. Additionally, modular sparse representation learning is used to improve the robustness of the learned representation. The experimental results on four datasets including face and object show that the recognition rates of the proposed method are significantly better than other state-of-the-art methods.


I. INTRODUCTION
With the development of electronic equipment, more and more visual image data or non-visual text data are produced in The associate editor coordinating the review of this manuscript and approving it for publication was Wen Chen . the Internet and daily social communication. Most of the generated data is unclassified or unlabeled, so it is difficult to use supervised methods for image and document classification. Therefore, semi-supervised learning (SSL) [1] has attracted more and more attention in machine learning and data mining. The core idea of SSL, especially semi-supervised VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ classification, is to learn classification model by using labeled and unlabeled data. For a given classifier, the goal of semisupervised boosting is to improve its classification performance by using the supervised information of labeled data and its relationship with unlabeled data. In particular, using unlabeled data in existing boosting methods [2]- [5] can obtain better performance of the boosted classifier.
In the semi-supervised boosting strategy, similarity is required, as shown in Figure 1. A new ensemble classifier is trained by selecting reliable unlabeled samples by similarity. Due to the simplicity of calculating Euclidean distance, using Euclidean distance measures the similarity between samples in most boosting strategies [3]. However, similarity (or distance measure) learning plays an important role in boosting tasks for two reasons: firstly, it is usually assumed that two samples with high similarity belong to the same class; secondly, due to the nonlinearity of data, Euclidean distance cannot represent the structural nonlinearity of data, especially in high-dimensional space. In the semi-supervised boosting strategy, Euclidean distance is used to calculate the Gaussian kernel similarity. However, how to set the kernel width in Gaussian kernel similarity is also a problem to be solved.
Recently, in the adaptive field of data similarity, representation based on sparse coding has received extensive attention [6]- [8]. Therefore, this article uses sparse representation as a measure of similarity.
In sparse representation learning, when a classifier or other prediction variables are constructed, the discriminative information of the data and the effective visualization of the data make it easier to extract information features from the data. As we all know, the traditional sparse representation and low rank representation (LRRs) cannot meet the requirements of real-time applications due to the large amount of computation [9], [10]. In addition, the learned data representation still failed to capture discriminable attributes of potential explanatory factors of observed inputs from different objects. Moreover, in many cases, the problem of image recognition will be complicated because the image is occluded, for example, the face image occluded with sunglasses, headdress, scarf, mask, facial hair or hand. In this case, the recognition method using local image information has advantages over the whole histogram feature. Features extracted from occluded areas will be lost, but those features extracted from no occluded areas will not be lost, and may be sufficient for accurate image classification. Classification decision is usually obtained by nearest neighbor algorithm, support vector machine or boosting strategy. In order to solve the above problems, this article proposes a modular sparse representation learning method (MSPASEMIBOOST) with marginal structure representation to calculate the similarity between images efficiently and effectively for semisupervised boosting strategy. Firstly, the image is divided into several smaller image blocks. Then, on each image block set, the similarity between image blocks is calculated based on a simple and effective sparse representation with margin regression target. Secondly, the weighted voting algorithm is used to combine the similarity results learned from each image block set into a whole. Instead of using the fixed 0-1 matrix as the regression target, the self-adjusting regression target with better approximate optimal margin constraint is constructed directly to measure the regression result more accurately. The purpose of probabilistic graph structure adaptation is to capture the potential structure with data connectivity, so as to guide the construction of margin regression target. In addition, the regression results are further predicted in the potential subspace of the data to capture the potential correlation patterns. The obtained equation has a closed form solution corresponding to each sub-problem and can be solved by an effective iterative algorithm. Finally, according to the similarity confidence, the unlabeled data is allocated with pseudo labels. According to the confidence level, some unlabeled data are selected and added to the training sample set to train the weak classifier. After training, the weak classifier is combined into the final classifier. A large number of experiments have proved the recognition ability and effectiveness of the proposed MSPASEMIBOOST method in solving different recognition tasks. In a word, the main contributions of the MSPASEMIBOOST framework are as follows: 1) A modular sparse representation semi-supervised boosting framework based on joint flexible self-adjusting marginal target analysis, construction of discrimination subspace and adaptive probability graph structure is proposed. Therefore, the data representation method has obvious discrimination ability, and has close to the optimal margin. Our method has achieved good recognition results.
2) Adaptive graph structure learning captures the probabilistic connectivity between each pair of samples in the regression task. The inherent structure of data is usually estimated by using the shared information in the data. In addition, the linear structure predictor learns to use the original and potential correlation information of the data to reliably predict the regression task.
3) One image is segmented into several smaller image blocks, and the similarity between the image blocks is calculated through the marginal visual representation learning based on joint flexible self-adjusting marginal target analysis, discrimination subspace construction and adaptive probability graph structure. Especially when a part of the image is occluded, the local feature is proved to be an improvement on the result obtained by using the whole image feature. When the similarity decision is obtained from each image block, the weighted voting method is used to combine these decisions to generate the final decision. In addition, the scale of each image block similarity measurement problem is much smaller and can be solved in parallel faster.
4) The final decision is taken as the similarity measure of semi-supervised boosting.
The rest of this article is organized as follows. The second part briefly introduces the related work. Then, the proposed MSPASEMIBOOST is described and the optimization algorithm is given in the third part. The fourth part gives a large number of experimental results, and the fifth part gives the conclusion.

II. RELATED WORK
We first give some symbols used in this article. The matrix is represented in capital letters, such as X . The ith row and jth column elements of matrix X are represented by X ij . Lowercase letters represent column vectors, such as x.
The Frobenius norm of matrix X is defined as ||X || 2 F = tr(X T X ) = tr(XX T ), where tr(·) is the trace operator.
The goal of semi-supervised boosting [2]- [5] is to improve its classification performance by using the supervised information of labeled data and its relationship with unlabeled data. Its strategy is shown in the figure 1.
Given that one dataset X = [x 1 , x 2 , · · · , x n ] = [X l , X u ] contains n elements to represent the entire dataset, there are n l labeled data X l = [x l 1 , x l 2 , · · · , x l n l ] and n u unlabeled data The label subset of n l labeled data is y l = [y l 1 , y l 2 , · · · , y l n l ], label set is l = [1, 2, · · · , C], C is the total number of classes. y l i is one vector. If x l i belongs to the kth class, then the kth element of y l i is 1, that is, y l i (k) = 1, and the other elements are 0, with y l i = [0, · · · , 0, 1, 0, · · · , 0]. The label subset of unlabeled data is y u = [y u 1 , y u 2 , · · · , y u n u ], where n = n l + n u . The label of the total dataset is y = [y l ; y u ]. S = [S i,j ] n×n is one symmetric similarity matrix, where S i,j ≥ 0 represents the similarity between x i and x j . Unlabeled data x u has the same label as labeled data with the highest similarity. The symmetric matrix S lu represents the similarity between labeled data and unlabeled data. L (Y , S lu ) represents the inconsistency between labeled data and unlabeled data. Two unlabeled data with the highest similarity are assumed to have the same label. The symmetric matrix S uu represents the similarity between unlabeled data. U (Y u , S uu ) represents the inconsistency between unlabeled data. Therefore, the loss function (y, S) can be obtained from the above two terms. Our goal is to find the label y u that minimizes (y, S).
Specifically, the loss function is as follows: where w 1 and w 2 are weights, weighing the importance of labeled data and unlabeled data. According to [2], let w 1 = 1 n l n u , w 2 = 1 n u n u . The two terms in (1) are as follows: Let h t (x) denote the classifier trained by the learning algorithm in the tth iteration, where h(x i ) = y l i , i = 1, · · · , n l and H (x) denotes the ensemble classifier, which has the following equation: where α t is the weight of the weak classifier h t (x). (y, S) uses upper bound optimization method for iterative minimization, similar to [2]. The upper bound of (1) is shown in VOLUME 8, 2020 the following equation (5). Suppose: where p j,k can be interpreted as the confidence degree of unlabeled data x j classified into the k(k = 1, 2, · · · , C)th class. As known in (6), the higher unlabeled data and labeled data have the similarity, the greater the confidence degree p j,k obtained is. Unlabeled data with high confidence degree will be assigned a pseudo label and selected to train the next new classifier. The ensemble classifier is updated by multiplying the weak classifier generated by the previous iteration with its corresponding weight. When the weight α < 0, the iteration stops. The calculation equation of weight α is shown in equation (7). where From (6), similarity plays an important role in selecting an unlabeled data to train the next ensemble classifier.
However, measurement of similarity in most semi-supervised boosting learning methods depends on prior knowledge. If there are only a small amount of labeled data available, there are no reliable models to select, so it is difficult to determine the best parameters for measurement of similarity based on prior knowledge [11], [12]. In recent years, the representation based on sparse coding has attracted much attention [6]- [8]. In order to obtain flexible similarity matrix and get better performance, Wang et al. [13] utilized sparse representation to learn similarity, and proposed similarity learning based on sparse representation for semi-supervised boosting. The function of similarity learning based on sparse representation is (8) where matrix D ∈ R d×C is obtained from data X = [x 1 , · · · , x n ] ∈ R d×n , d is the dimension of samples, n is the number of samples, C is the number of classes of samples, However, the global sparse representation of the algorithm has high computational complexity and cannot capture the local potential distinguishable attributes of the data, especially when the image is occluded. The modular sparse representation can weaken the feature extraction of the occluded area, strengthen the feature extraction of the non-occluded area, and achieve more accurate classification for image recognition. In order to solve these problems, this article uses modular sparse representation to learn similarity in semisupervised boosting learning, introduces and applies structural predictor learning to mine the interclass separation of the learned representation by imposing the low-rank regularization, and introduces and applies adaptive graph structure learning to improve the intra-class compactness of the learned representation by constructing an adaptive probabilistic graph [14]. To mine the interclass separation and improve the intra-class compactness of the learned representation,the function of similarity learning is where S = [S 1 , · · · , S n ], S il i is the value at the l i position in S i , and J is the margin constant, the rank of matrix D is s(s < min(C, n)), F ∈ R d×s , Z ∈ R s×C , γ , λ 1 and β are nonnegative parameters, the constraint 0 < P ij < 1, Pe n = e n guarantees that P represents the transition probability matrix, and each row of which is one probability distribution. The decision combination problem of different image blocks based on modular sparse representation has been widely concerned [15]. The main idea is to combine the decisions of different sub image blocks using voting scheme. The existing typical voting scheme is the maximum number of votes [15]. Majority voting requires that each classifier has a vote to vote for the class with the lowest representation error, while all other classes will not get a vote. The decision center summarizes the number of votes obtained by each class, and classifies the image into the class that get the most votes. On the basis of the maximum number of votes, the Borda counting method was proposed in [16], in which each class is given a weight, that is, ranking representation errors of all classifiers from low to high, and the number of classes representation errors of which are higher than representation error of this class is the weight of this class [16]. If we consider a sub-image y k (k = 1, · · · , M ), the weight of the class that produces the minimum representation error for y k (k = 1, · · · , M ) is (C − 1), and the weight of the next class with the smallest representation error is (C − 2). The rest can be done in the same way. For each class j(j = 1, 2, · · · , C), we denote N (i) j as the number of times that the jth class is ranked in the ith position on all sub-images y m (m = 1, · · · , M ). The total number of votes obtained by the jth class can be expressed as follows: where λ i (i = 1, · · · , C) is the weight of the jth class in different ranking positions. In the case of Borda counting, the weight λ i is C −i. The class that gets the most votes is used as the output of the classifier, which is called Borda counting vote SRC:ĉ The Borda counting method is easy to implement, but the difference between classifiers is not considered in the weight calculation of each sub image in the Borda counting method [16]. Although it is well known that some parts of the image have more information for expression recognition, the classifier treats each sub-image equally. In addition, the relative difference in the representation error of different classes to a region is not reflected in the votes allocated to each class, that is, no matter how close the representation error of two classes to the region, the number of votes obtained by two classes is a difference of 1. The original calculation model of voting weight cannot be directly applied to semi-supervised boosting framework. In order to solve this problem, sigmoid function is used to calculate the credibility weight of each sub image in this article, so as to reduce the impact of information of these occluded and polluted modules on recognition.
For each sub image y k (k = 1, · · · , M ) of the sample y, sparse representation is used to solve the sparse coefficient O y k , and the coefficients corresponding to training samples of the ith class are summed by absolute value, and the coefficient of the i(i = 1, 2, · · · , C)th class in the kth sub image of the sample y is constructed.
where φ i (O y k )is the coefficient corresponding to training sample of the ith class in the kth sub image of the sample y.
The sub image class vector by k is constructed by using the coefficient of all C class in the kth sub image of the sample.
Use (14) to calculate the sparsity sy k of each sub image y k (k = 1, · · · , M ) of the sample y.
When there is only one non-zero element in by k , sy k is the maximum value 1, and when all the elements have the same non-zero value in by k , sy k is the minimum value 0.
The residual error ry k of each sub image y k (k = 1, · · · , M ) of the sample y is calculated by using the l 2 normal form.
The sigmoid function is used to calculate the confidence level wy s k of sparsity of each sub image y k (k = 1, · · · , M ) in the sample y.
where s 1 and s 2 are two security thresholds of sparsity. The sigmoid function is used to calculate the confidence level wy r k of residual of each sub image y k (k = 1, · · · , M ) in the sample y.
wherer 1 and r 2 are two security thresholds of residual. Figure 2 shows that changes of the credibility weights wy s k and wy r k with sy k and ry k , respectively. According to (sy k ≥ s 2 )∧(ry k ≤ r 1 ), the sub images which are not occluded or polluted are distinguished. According to (sy k ≤ s 1 ) ∨ (ry k ≥ r 2 ), the sub images with occlusion or pollution are identified. Therefore, the credibility weight value of sub image can be obtained by the following equation (18).

III. PROPOSED MODULAR SPARSE SEMI-SUPERVISED BOOSTING LEARNING A. BASIC IDEA
Combined with related work, the idea of this article is based on the following considerations: (1) In [16], the weights of sparse representation of image blocks were calculated by Borda voting weighting algorithm. Although Borda counting is easy to implement, the difference between classifiers corresponding to image blocks is not considered in the classifier of Borda counting method [16]. In addition, the relative differences in the representation errors between different classes within the same region are not reflected in the votes allocated to each class. We use the sigmoid function to calculate the credibility weight of each sub image to improve the performance of Borda voting weighting algorithm, and use it in semi-supervised boosting learning.
(2) In semi-supervised boosting learning, structure prediction learning is introduced and applied to mine the interclass separation of the learned representation by imposing the low-rank regularization. Adaptive graph structure learning is introduced and applied to improve the intra-class compactness of the learned representation by constructing an adaptive probabilistic graph.
Based on the above analysis and the existing related work, we apply the low-rank regularized structure prediction learning, sparse representation with adaptive graph structure learning, and the improved model of sparse weight calculation of related modules to improve similarity learning model of semi-supervised boosting, and propose an adaptive modular sparse semi-supervised boosting (MSPASEMIBOOST) learning algorithm.
On the basis of [14], MSPASEMIBOOST learning model can be expressed as follows: where (·) is the discriminant loss function to measure the error between the predetermined target and the predicted result, (D) is the structural prediction learning for controlling the complexity of matrix D, and (D) is the graph adaptive learning for controlling the smoothness of matrix D. In addition, λ 1 and λ 2 are two parameters balancing the importance of the three terms.

B. ALGORITHM STEPS
Therefore, an iterative optimization algorithm is proposed. In order to clearly express the main algorithm steps, the key algorithms are defined and described first.
(1) Key algorithm 1: calculate D and its two decomposition matrices F and Z , labeled data sample connection probability matrix P and matrix S. The specific formula is given in [14].
where γ , λ 1 , λ 2 and β are regularization parameters. 1) D and its two decomposition matrices F and Z calculated by iterative process.
The specific calculation formula is obtained by referring to [14].
where E is the graph Laplacian matrix of P, which is defined as E = T − P, and T is a diagonal matrix whose main diagonal elements are the column sum of matrix P, that is, T = n j=1 P ij .
Update Z : According to the constraint term F T F = I , we can deduce the solution of Z by setting the first derivative ∂ ∂Z = 0 of versus Z . Namely: Update D: we plug Z into the objective function , the objective function is re-expressed as:
2) According to [14], S was updated by P, D, F, Z . Removing the constant term independent of S, the objective function of (20) can be rewritten as: Get: where W = D T X , ξ = j =c g j (ψ (g j )>0) 3) Fix S, D, F and Z , connection probability matrix P of the labeled data samples is updated iteratively.
where p i and t i are the ith row of P and T , respectively, , k is the number of nearest neighbors, andt i is t i 's vector in ascending order.
(2) Key algorithm 2: iterative computation of modular sparse dictionary for labeled data samples The convergence criterion used in key algorithm 2 is that the number of iterations reaches 30 or | t+1 − t |/ t < 0.001, where t is the value of the objective function in the tth iteration. The main steps of algorithm 2 are shown in Table 1.
(3) Key algorithm 3: Calculate the similarity of modular sparse representation of unlabeled test samples according to the obtained modular sparse dictionaries D 1 , D 2 , · · · , D M . The main steps of algorithm 3 are shown in Table 2.
Based on the above main algorithms. The main steps of MSPASEMIBOOST are shown in Table 3.

IV. EXPERIMENTAL RESULTS AND ANALYSIS
In order to evaluate the performance of proposed MSPASEMIBOOST algorithm, we compare it with 12 mainstream data representation learning methods. In order to compare with the mainstream algorithms, three real face databases (EXTENDED YALEB, CMU PIE and AR) and COIL-100 object recognition database are selected as datasets.
A. EXPERIMENTAL DATASET 1) EXTENDED YALEB FACE DATABASE EXTENDED YALEB face database contains 2414 frontal face images from 38 subjects, each of whom took about 64 images under different lighting conditions. The main challenge of this database is to deal with different lighting conditions and facial expressions. Figure 3 shows a set of images from EXTENDED YALEB face database.

2) CMU PIE FACE DATABASE
CMU PIE face database includes 41368 face images of 68 subjects. The experiment was conducted on images of five postures(C05, C07, C09, C27 and C29), with 170 images per subject. Figure 4 shows a set of images from CMU PIE face database. VOLUME 8, 2020

3) AR FACE DATABASE
AR face database contains 4000 color face images of 126 subjects with different illumination, disguise and facial expression. Each subject provided 26 face images which were taken in two stages under different conditions. In this experiment, a subset of 2600 images were selected from 50 female and 50 male subjects. Similar to the implementation in [17], the image is projected to 1540 dimensions using the randomly generated matrix in normal distribution with zero mean. Figure 5 shows a set of images from AR face database.

4) COIL-100 DATABASE
COIL-100 database consists of 100 objects with different illumination conditions and different angle in 360 degree rotation. The size of each image is adjusted to a 32 by 32 matrix. The challenge of this database is that the images were taken from different angles. Figure 6 shows a set of images from COIL-100 database.

B. EXPERIMENTAL RESULTS AND COMPARATIVE ANALYSIS
For EXTENDED YALEB and CMU PIE, 10, 15, 20 and 25 images were randomly selected from each subject as    training samples, and the rest were used as test samples. For AR dataset, 8, 11, 14 and 17 images were randomly selected from each subject as training samples, and the rest were used as test samples. For COIL-100 data set, we randomly selected 10, 15, 20, 25 images from each object as training samples, and the rest were used as test samples. We selected CAPSVM [18], PROCRC [19], DSRM [20], RLSL [21], DLSR [22], RPCA [23], LLRR [24], SLRM [25], PCE [26], MSRL [14], SEMIBOOST [2], XGBOOST [27] to  compare with the proposed MSPASEMIBOOST method in this article, so as to evaluate the classification performance of MSPASEMIBOOST method. All experiments were repeated 20 times, and then the average recognition rates were calculated. The average recognition results are shown in table 4-7, and relations of the number of training samples from each subject to average recognition accuracy on four datasets are given in Fig7-10. Table 4 lists the experimental results of 13 comparison methods on EXTENDED YALEB face database. The results of the first 12 methods were copied from the respective papers. Fig. 7 shows the results of some typical algorithms. As can be seen from table 4 and Figure 7, the proposed MSPASEMIBOOST method achieves the highest average recognition rate, which also verifies the effectiveness of our method. Compared with other algorithms, MSPASEMI-BOOST achieves about 2% improvement on EXTENDED YALEB face database. The recognition rate increases with the increase of training samples.   Table 5 lists the experimental results of 13 comparison methods on CMU PIE face database. The results of the first 12 methods were copied from the respective papers. Fig. 8 shows the results of some typical algorithms. As can be seen from table 5 and Figure 8, the proposed MSPASEMIBOOST method achieves the highest average recognition rate, which also verifies the effectiveness of our method. Compared with other algorithms, MSPASEMI-BOOST achieves about 1.6% improvement on CMU PIE face database. Table 6 lists the experimental results of 13 comparison methods on AR face database. The results of the first 12 methods were copied from the respective papers. Fig. 9 shows the results of some typical algorithms. As can be seen from table 6 and Figure 9, the proposed MSPASEMIBOOST method achieves the highest average recognition rate, which also verifies the effectiveness of our method. Compared with other algorithms, MSPASEMIBOOST achieves about 3% improvement on AR face database. Table 7 lists the experimental results of 13 comparison methods on COIL-100 database. The results of the first 12 methods were copied from the respective papers. Fig. 10 shows the results of some typical algorithms. VOLUME 8, 2020   As can be seen from table 7 and Figure 10, the MSPASEMIBOOST method proposed in this article achieves the highest average recognition rate, which also verifies the From table 4 to 7 and figure 7 to 10, we can see that the proposed MSPASEMIBOOST method achieves better classification performance in two different application scenarios. According to these experimental results, it can be concluded that:

4) EXPERIMENTAL RESULTS ON COIL-100 DATABASE
(1) Compared with the linear regression methods such as SLRM, the proposed MSPASEMIBOOST method has a significant improvement. This shows the necessity and superiority of learning prediction target from robust potential subspace and adaptive probability graph structure.
(2) SLRM and the proposed MSPASEMIBOOST method are better than those based on sparse, low rank representation and dictionary learning. The main reason is that they focus on exploring the best reconstruction of the original data, which does not mean the best discrimination. This result clearly proves that the discriminant information encoded by the margin constraint and the graph regularization play a positive role in improving the intra-class compactness.
(3) DLSR, MSRL and the proposed MSPASEMIBOOST method achieves good results due to the relaxation of regression objectives. However, the margin targets of MSRL and MSPASEMIBOOST are directly learned from the data rather than binary regression target. The adaptive graph structure provides a better solution to the over fitting problem. This proves that these two properties are important for learning margin visual representation.
(4) SLRM, DSRM, MSRL, the proposed MSPASEMI-BOOST method are generally better than other methods, which shows the effectiveness of graph structure learning in predicting targets, and makes the scattered visually similar data share the common target implicitly together.
(5) The classification recognition rate of the proposed MSPASEMIBOOST method based on modular sparse representation is significantly higher than that using global information. MSPASEMIBOOST achieves about 3% improvement on AR face database, which proves the effectiveness of the modular method. (6) The classification performance of the proposed MSPASEMIBOOST method is superior to that of other non-BOOST methods, and it benefits from the BOOST method combining several weak classifiers to produce a very accurate classifier.
(7) The proposed MSPASEMIBOOST method has a higher recognition rate than SEMIBOOST and XGBOOST methods. It benefits from the use of sparse representation to measure the similarity between unlabeled data and labeled data in the MSPASEMIBOOST method, and the margin representation and adaptive graph structure learning are integrated in the sparse representation, which can correctly maximize the multiclass margin and realize efficient and effective visual representation learning.

V. CONCLUSION
When the image has occlusion or pollution, residual errors generated by all classes in the polluted sub image may be similar, and residual error generated by the correct class may not be minimum, which affects the final discrimination result. To solve this problem, a semi-supervised boosting framework based on modular sparse representation (MSPASEMI-BOOST) is proposed. The similarity algorithm based on modular sparse representation is used in semi-supervised boosting framework to seamlessly combine the local consistency and global consistency of the regression target into a common framework for data representation. The margin target learned from the data provides enough flexibility for the fitting regression task. At the same time, the potential information of the data is used to predict the target. Compared with other representation methods, the learning method of data representation described in this article has stronger information and discrimination ability. The problem is solved by iterative optimization strategy. In addition, compared with other algorithms, the proposed MSPASEMIBOOST method achieves about 2%, 1.6%, 3% and 4% improvement on EXTENDED YALEB, CMU PIE, AR face database and COIL-100 database respectively. Experimental results on four databases show that our method is superior to other data representation methods, which shows the effectiveness of the proposed MSPASEMIBOOST method.