A Novel Active Learning Algorithm for Robust Image Classification

Training samples need to be labeled before being used to train classification model, which usually takes too much labor and material resources. Recently, this problem has attracted widespread attention. In order to reduce the workload of labeling samples, we propose a novel active learning methodology, which uses locally linear reconstruction coefficients to construct semi-supervised data manifold adaptive kernel space. Comparing the new method with other sampling approaches on several real-world image datasets, experimental results indicate that the novel algorithm has preferable classification ability. Especially, it can show higher classification accuracy under the condition that only a few samples are selected to train the classifier model.


I. INTRODUCTION
Improving the accuracy of image classification is one of the most important and interesting problems in machine learning field [1]- [3]. Many researchers carried out various image studies in recent years, such as human face image classification [4]- [6], handwritten digital image recognition [7]- [9], remote sensing image classification [10], [11] and so on. Traditional supervised data classification models almost all based on statistical methodology [12], [13], which first need a user to label numerous samples and then apply these labeled sample points to train the classification model. However, in a host of real-word applications of the image classification, labeling unclassified data samples are costly and time-consuming, especially for some special image recognition problems that need prior professional knowledge. For instance, people who label the samples that come from functional magnetic resonance imaging need to master some biomedicine knowledges. It is hard to obtain enough labeled samples that are used for training the classifier model, while in the case of training samples shortage possibly make the classification ability of the classifier decrease severely.
The associate editor coordinating the review of this manuscript and approving it for publication was Inês Domingues .
Consequently, how to collect the most representative sample subset from the whole dataset, cut down the cost of labeling samples, and train an excellent classifier has become one of the most critical problems in data classification [14], [15]. In order to solve these issues efficiently, a lot of people try to apply active learning in different kinds of classification tasks [16], [17]. Many studies show that active learning algorithm can greatly reduce the workload of labeling samples and successfully obtain higher classification accuracy. Thereby, it has attracted wide attention in pattern recognition, data mining, etc. [15]- [17]. With active learning, a learner can actively select the unlabeled samples that contain the most information and transmit them to an expert to label, and then add these labeled samples to the training set to train the classifier. This is very different from the previous passive learning that passively labels and processes for all data samples.
The problem in active learning is to determine which sample contains the most information. According to the difference of selecting strategies, active learning algorithms mainly include three kinds of methods that based on uncertainty [18], committee [19] and generalized error reduction [20], [21]. Many active learning techniques, such as Optimal Experimental Design (OED) [22] and Transductive Experimental Design (TED) [23], [24], have been proposed for image and text classification. Recently, the algorithm which bases on the OED is very popular in machine learning due to its good efficiency and the absence of priori knowledge of the samples. OED is related to the experimental design of which goal is to minimize measurement error. Three most famous methodologies of OED include A-Option Design (AOD), D-Option Design (DOD) and E-Option Design (EOD), which optimize for trace, determinant, and eigenvalue of a matrix respectively [22]. In the OED algorithm, each sample point is viewed as an experiment, and its label is considered as a measure. Selecting the best samples is thought to design an experiment that allows one of the parameters in the measurement model to have the least variance. However, this method only let the optimization target focus on the variance of the correlation parameters. It does not optimize the classifier's prediction error with the selected data directly. Considering that there are some shortcomings in OED, related researchers proposed TED algorithm on the basis of the OED. Although TED basically comes from OED, it estimates the expected prediction error on all examples, not only labeled but also unlabeled. It is important to note that both OED and TED algorithms are strictly correlated to statistics. Choosing the samples that can minimize the expected prediction error is equivalent to select the data points which can linear reconstruct the rest samples optimally. However, TED only pays close attention to the global and discriminant structure of data space. The intrinsic geometric structure that is propitious to increase the classification accuracy of the classifier is usually discarded by TED [25], [26].
For studying the potential geometric structure properties of the samples space, researchers have designed many different manifold learning methodologies, e.g., ISOMAP [27], Locality Preserving Projections (LPP) [28] and so on. Although these methods have some incoherence differences between each other, locally invariant conception is the most significant common idea among them [29]. With regard to locally invariant conception, all the neighbors have common properties, i.e., the samples that are close together have the same class attributes. Based on the TED and given consideration to the locally invariant conception of manifold learning, D. Cai et al. present a new method that named Manifold Adaptive Experimental Design (MAED) by using graph Laplacian [30], [31]. It solves active learning problems by implementing the data classification in manifold kernel space [32]. Classification accuracy of the MAED algorithm closely relates to the selection strategy of the constructed graph. Hence, how to obtain an efficient and effective neighbor graph is the most important problem in the MAED approach. There are many options for constructing the manifold structure graph [31]. A new manifold adaptive active learning methodology which called Semi-Supervised Adaptive Experimental Design (SSAED) for image active learning under the circumstance of semi-supervised learning [33]- [36] is proposed in this study. By calculating the locally linear reconstruction coefficients [37] of sample points with the semi-supervised strategy and incorporating graph structure into the manifold FIGURE 1. An example of the active learning process. U denotes the unlabeled sample database that has never been tagged and E the expert system that can correctly label the samples in U. L denotes the labeled sample dataset which is used to train a classifier, C is a certain kind of classifier, and H is a query function that is used to query information in the pool of U.
kernel, the new manifold structure can be used in the learning process. Then, the OED can be efficiently completed in the manifold adaptive kernel space. It is noteworthy that many active learning algorithms don't pay attention to the class information when building the intrinsic manifold structure. Interestingly, the novel algorithm that proposed by us can effectively solve this problem.
We make elaborative arrangements for the remainder of our article with this type: details of method and materials that are adopted in this study are described in section two. In the third section, we present experimental results and discussion. Finally, we summarize this study and give out the conclusion.

II. METHOD AND MATERIALS
In the process of human learning, people often first use the existing experience to learn new knowledge, then rely on the acquired knowledge to summarize and accumulate the experience. Hence, these experience and knowledge are constant interactive. Similarly, machine learning simulates the process of human learning by using the knowledge of existing to train model and obtain new knowledge firstly, then correct the model with the accumulation of information. By this way, a new model that is more accurate and useful can be obtained. Active learning selectively acquires a small number of samples that contain the largest amount of information, which is different from previous passive learning that passively labels and processes all data samples. In active learning, the learner actively selects the most representative unlabeled samples and transmits these samples to an expert to label, then adds these labeled samples to training dataset to train the classifier [15], [16]. Fig.1 shows the brief procedure of active learning. Learners use a small amount of initial labeled samples L to learn, the query function H first chooses one or a group of the most useful samples from the unlabeled samples pool U, then asks these samples' labels from the supervisor E, finally uses the new knowledge to train the classifier C and implements the next round of inquiry. Active learning is a cyclic process that does not stop until a certain criterion is reached.
The fundamental purpose of active learning is to search the most representative samples. In Euclidean space, given a sample dataset S = {s 1 , s 2 , . . . , s i , ...s n }, where s i denotes a sample that is represented by a vector of some dimension, i ∈ [1, n], and n denotes the total number of samples. Searching the most representative samples is to search a subset Q = q 1 , q 2 , ...q j , ...q m ⊂ S that contains the maximally informative data samples, where m < n, j ∈ [1, m]. That is to say, it can obtain the highest classification accuracy when these subsamples are labeled to treat as training data. Seeking the maximally informative samples to label is routinely described as experimental design in statistics [22]. In the sample dataset S, each sample point is usually considered to be an experiment and its corresponding class label is considered to as a measurement.

A. THE METHOD OF OED
The research task of OED is strictly related to the experimental design that is expected to minimize variance of a parameterized model [22]. Take into account f (s) = w T s from c = w T s + , where f (s) denotes the linear function, c the measurement, and the measurement noise that follows the zero mean normal distribution of which variance is equal to σ 2 . Assuming that there exist an array of labeled where w denotes the weight vector andŵ is its maximum likelihood estimate. Naturally, we can get the estimate error as follows: Reference [31] points out that the above error has zero mean and a covariance matrix produced from σ 2 H w . Where H w denotes the inverse of the Hessian operator of the objective function L(w) in (1), and it can be written by: where Q = {q 1 , q 2 , . . . , q j , . . . , q m }. Thereby, OED aims to select the subsamples by minimizing the error that is produced from the matrix H w . The most famous optimal designs include the AOD, DOD and EOD, which optimize for trace, determinant and eigenvalue of the matrix H w respectively [22].

B. THE METHODOLOGY OF TED
The TED method that is essentially based on the OED, evaluates the expected prediction error on both labeled and unlabeled examples. Giving out the simple dataset S, minimizing the average expected square predictive error of the S can be equivalently viewed as a simplified optimization target as follows: The most important problem in (4) is how to search the subset Q, however, it is difficult to obtain a global solution. This is because that the issue is the NP-hard [23]. After a few transformations, it is able to convert the task (4) into the following form: is a vector of auxiliary reconstructing coefficient that uses the linear combination of the subset Q = {q 1 , q 2 , . . . , q j , . . . , q m } to fit the data point s i , i.e., the whole dataset S can be reconstructed by the subset Q. The ξ is a regularization parameter that is used to control the amount of shrinkage [38]- [40]. Whereas, the above formula is still a disadvantageous approach. This is due to the solution in (5) is suboptimal.
In order to thoroughly solve the above optimization problem, a series of auxiliary variables b = (b 1 , b 2 , . . . , b i , . . . , b n ) are introduced to control the noise of training samples, i.e., b i and s i are one-versus-one, which help us to determine the sample s i of the whole dataset S is selected or not selected as a training sample by the b i . Do some transformations, it can convert the target (5) into another form: where ϕ is a regularization parameter. Minimizing b 1 can obtain a sparse vector of which some elements are zero. Supposing that b j = 0, we would have a i,j = 0, or else the (6) would become infinity, that is to say, the whole optimization would not get a satisfying solution. Thereby, the j-th sample s j that corresponds to the b j would not be selected, then we can select the most informative samples according to the value of b j . Reference [24] reveals that the (6) is convex optimization, hence, it can make sure that the calculations of the global optimization solution come out.

C. THE ALGORITHM OF MAED
Considering that the TED approach has some shortcomings and the potential geometric attributes of sample data space are exceedingly useful for increasing classification accuracy, it is natural to think that put the manifold structure into the process of active learning. As for how to carry out this idea concretely, an effective way is to implement the active learning tasks in the manifold adaptive kernel space. We demonstrate the details as follows: Giving out the sample dataset S and defining a mathematical formula r s = (R(s, s 1 ), . . . , R(s, s n )) T , then the reproducing kernel of Hilbert space can use the following equation to represent itself [31], [32]: where E denotes an identity matrix, R is a kernel matrix that comes from Hilbert space, and ρ ≥ 0 represents a constant that is used to regulate the smoothness ofR(s, q). With regard to the matrix R, it can use the classic kernels, such as the linear kernel, Gaussian kernel, to obtain it. As for the construction of the reproducing kernel, the most important thing is the selection of the matrix F. Based on different kinds of graph construction methods, we can obtain various matrix F. By choosing the matrix F that bases on graph Laplacian [30], [41], MAED method presents the datadependent kernel and performs convex TED problem in the manifold adaptive kernel space. For the sample dataset S, let R(s i , s j ) = ψ(s i ), ψ(s j ) , ψ denote a feature map that from raw input data space to the reproducing kernel Hilbert space [32], [42]. Then the convex TED of (6) is rewritten by another form: where ψ(S) denotes the data matrix of the reproducing kernel Hilbert space, i.e., ψ(S) = (ψ(s 1 ), . . . , ψ(s n )). With (8), we consider a n × n kernel matrixR like thisR i,j =R(s i , s j ), and let v i be its i-th column (or row, because R is symmetric) vector. Then, after some mathematical deductions, we can get the following equation: where diag(b) denotes a diagonal matrix whose elements on the main diagonal are b 1 , b 2 , . . . , b n . Once a i is gotten, fix a i and search the minimal value for b j . Do some similar mathematical derivations again, we can obtain the following equation: Therefore, by iteratively compute until convergence, a i,j and b j can be successfully calculated. Hence, it can choose the most informative samples according to b j (j = 1, 2, . . . , n) that are arranged in descending order.

D. THE NOVEL ALGORITHM
Obviously, the performance of the MAED approach mainly depends on the choice of the matrix F. Thereupon, it is possible to design a robust active learning method, which uses a new graph construction style to calculate the manifold adaptive matrix. By calculating the locally linear reconstruction coefficients of sample data points with the semi-supervised manner and incorporating the manifold structure into the active learning process, we propose a novel active learning algorithm that is called semi-supervised adaptive experimental design (SSAED) for image classification. In this subsection, we first give out the graph construction method used in this paper and then present the new active learning algorithm that bases on this graph adaptive kernel space. Some details of graph embedding can be found in reference [43].

1) GRAPH CONSTRUCTION
(1) Construct a nearest neighbor graph with the semisupervised manner. Given a set of sample data points S = {s 1 , s 2 , . . . , s n } in IR m , for any data point s i with known class, firstly, find the samples which have the same class information, and then use Euclidean distance to compute its k nearest data samples from the same class dataset. Otherwise, for a sample without class information, we directly use Euclidean distance to compute the k nearest data points from the whole dataset.
(2) Compute reconstruction weight matrix and reconstruction graph. The weight matrix W is computed by solving the following optimization problem: where W ij denotes the contribution of the j-th sample data point to the i-th sample data point, N i denotes the nearest neighbors of the sample point s i . In order to make the problem (11) well-posed, it should minimize the above cost function under the condition that (11) meets two constraints: (i) s i is reconstructed only from its neighbors, in other words, W ij = 0, ∀j / ∈ N i ; (ii) the elements of each row of the coefficient matrix add up to one, i.e., N j=1 W ij = 1, ∀i.

2) THE NEW ACTIVE LEARNING ALGORITHM
Applying the graph constructed above, the procedure of our novel active learning algorithm can be summarized as follows: Input: Giving the samples S = {s 1 , s 2 , . . . , s n } in Euclidean space, which is partially labeled with class information.
(1) Build local line reconstruct graph based on LLE [37]. Under the semi-supervised condition, construct a nearest neighbor graph with weight W as demonstrated in the (11), and then calculate the locally linear reconstruction graph as presented in the (12 (7) by , let r i be the i-th column of the above R, and let R(i, j) be the element of the i-th row, j-th column of R. Finally, we obtain the manifold adaptive kernel: (4) Calculate deformation of convex TED in reproducing kernel Hilbert space. Let v i be the i-th column (or row, because R F is symmetric) of the new kernel R F that is calculated in the previous step. Initialize a i,j = 1, then we can ultimately obtain: (5) Choose samples. The samples of the original dataset S = {s 1 , s 2 , . . . , s n } are arranged from big to small on the basis of the numerical value of b j . Then, the maximally informative samples can be selected by the top t samples, because b j and s j are one-versus-one. It is noteworthy that people can label for the subset that is combined by the top t samples, which is easier than label for the whole dataset. Output: All the original data samples are rearranged according to their importance, and give out the optimal subset that is used to do the classification tasks.
After choosing the most useful samples, many kinds of image classification tasks can be successfully carried out. Our novel algorithm, SSAED, is substantially based on the TED, but the SSAED method attaches great importance to the geometric structure of data space. Comparing with the MAED algorithm, our novel active learning method also bases on manifold assumption, which defines the adjacency graph with a certain neighborhood size for all samples. The differences between these two manifold learning methods mainly have two aspects. In the first place, the styles of constructing graph are remarkably dissimilar. Our novel algorithm adopts LLE to construct the graph, while MAED uses LE to do this. In the second place, the ways that use the class information of samples are different. The novel algorithm constructs the manifold adaptive kernel with the semi-supervised manner, which not only uses the structure of data but also considers the category information of the data. It makes the proposed method more robust, efficient and effective. However, MAED only adopts the supervised manner to construct the manifold adaptive kernel.

E. EXPERIMENTAL MATERIALS
In this study, we evaluate the performance of our novel active learning algorithm, SSAED, on several benchmark datasets. These datasets are all image data that are very popular in the machine learning field.

1) DESCRIPTION OF DATASETS
Programs are run on nine extremely famous real-world image datasets: MNIST, USPS, PIE, ORL32, ORL64, Yale32, Yale64, Yale-B, and CBCL. Except for the MNIST and USPS handwritten digital datasets, the other seven datasets are all human face images. ORL32 and ORL64 are datasets of two different dimensions in the ORL dataset. Similarly, Yale32 and Yale64 are datasets of two different dimensions in the Yale dataset. Table 1 gives out the attributes of the seven broad categories that are used in this study.

2) EXPERIMENTAL SETTING
In the data classification, a k-fold cross-validation strategy is adopted by us. As for the parameter k, we let it be 10. All experiments are performed in the software environment of MATLAB2017a and Windows 7 operation system. Concomitantly, the hardware specifications are Core i5 processor, 4G memory, independent graphics card, and 500G mechanical hard disk.

III. RESULTS AND DISCUSSION
We estimate the performance of the novel algorithm, SSAED, on several benchmark datasets. The comparison is made with two classification methods: the MAED algorithm and Random sampling approach (Random).
• Random sampling method conformably chooses examples as the training data, which does not consider the importance of each sample. That is to say, the strategy of the Random method is to select the samples in a disorderly manner. In this study, the Random method is used as the baseline of active learning.
• MAED algorithm, proposed by D. Cai et al. [31], selects the most informative samples with a restrictive parameter, which has been detailedly described in the method and materials section. The fundamental aim of this study is to investigate whether the novel method can obtain a satisfactory classification result under the condition that the training sample dataset's size is small. We let proportion of the labeled training data (the samples that are selected by Random, MAED and SSAED methods) change from 0.1 to 0.9. In Fig.2, the 'length(alpha)' denotes the proportion of the selected training sample data. We use the 10-point system, for instance, in the figure, '2', '4', '6', and '8' denote '0.2', '0.4', '0.6', and '0.8' respectively, i.e., alpha=2 means that twenty percentage samples in the dataset S = (s 1 , s 2 , . . . , s n ) are selected to be labeled and used as the training set. As is shown in the Fig.2, the classification accuracy is increasing with more samples are selected as the training set. When compared with MAED and Random methods, there is no consistent winner. Overall, our novel method performs the best, while the Random method leads to the lowest classification accuracies in all datasets. In the following passages, we present out the detailed descriptions for each subfigure of the Fig.2. As is shown in the Fig.2.a, our novel algorithm SSAED performs well when the size of the training sample set is small in the MNIST dataset. When 10% of samples are selected as the training set, the classification accuracy of SSAED is over 82%, while the Random method is lower than 80%. Except for the condition that 70% of samples used as the training set, SSAED performs better than the MAED algorithm. In the USPS handwritten digital image dataset, our new method is a consistent winner when it is compared with the MAED method (see the Fig.2.b). The discriminative ability of the selected informative samples by the SSAED algorithm is much better than that of Random methods on the USPS dataset when there are less than 50% of data points are selected. It is noteworthy that the SSAED method performs very well when the size of the training set is small. For instance, the classification accuracy of SSAED is over 86% and approximately overpass 3 percentage points compared with the result of Random method under the condition that only 10% of data points are selected to use as the training samples.
As can be seen from the Fig.2.c, the SSAED continuously shows better results than both MAED and Random methods in the PIE dataset. The accuracy of the new method overpasses 7 percentage points compared with the Random and 2 percentage points compared with the MAED method when only 10 percentage data samples are selected to use as the training samples. It can be seen from both Fig.2.d and Fig.2.e, SSAED shows a little bit better performance than the MAED method in the ORL datasets (ORL32 and ORL64), but performs much better than the Random method. When the alpha changes from 0.2 to 0.7, the SSAED is less sensitive to outliers than the other methods. As for the classification accuracies in the Yale datasets (Yale32 and Yale64), we can know that the SSAED performs better than the MAED and Random in most of the cases (see the Fig.2.f and Fig.2.g). However, all the three methods show their pool classification abilities that are all lower than 70% on this kind of face image dataset. As can be seen from the Fig.2.h, comparison among the three methods shows that our novel method performs the best in the Yale-B dataset. Particularly, the SSAED continuously obtains higher classification accuracy than the Random method when alpha ranges from 0.1 to 0.9. As is shown in the Fig.2.i, the SSAED consistently outperforms both MAED and Random in the CBCL dataset. With alpha increasing, when it reaches at 0.5, the SSAED method gets the highest classification accuracy that is over 95%. It is noteworthy that the classification accuracy of the SSAED is over 88% under the condition that only 10% samples are labeled as the training data. Whereas, the accuracy of MAED is 86% on the same condition, and the Random method needs more than twice samples to be chosen as the training set to achieve 88% classification accuracy. The results in the CBCL clearly show the advantages of our novel algorithm in selecting the best samples for labeling, i.e., these labeled samples that are chosen by the SSAED can improve the classifier the most. From the subfigure 'a' to 'i' that is orderly presented out in the Fig.2, we can clearly see that the classification accuracies in the handwritten datasets (MNIST and USPS) are all extremely satisfactory. Whether in a small training sample set or a large training sample set, good classification accuracy can be obtained. However, the classification results are unsatisfactory in some of the human face image datasets, e.g., Yale32 and Yale64. Additionally, we can also know that the SSAED can continuously perform better than the other two methods on the three datasets: USPS, PIE, and CBCL. It is important to note that the SSAED can obtain satisfactory results of which classification accuracies are more than 82% in the case of only choose 10% samples as the training set on the MNIST, USPS and CBCL datasets.

B. AVERAGE CLASSIFICATION ACCURACIES ON SMALL RANGE TRAINING SETS
Due to the fundamental aim of active learning is to select the best samples to train the classifier model and expect to obtain a satisfactory classification accuracy, experimental result comparison bars on three kinds of small size training sets are given in this part. In terms of the percentage, we choose the training set of which size accounts for 10%, 20%, and 30% of the whole dataset respectively. All the comparisons are implemented in the nine image datasets (MNIST, USPS, PIE, ORL32, ORL64, Yale32, Yale64, Yale-B and CBCL) with the three algorithms (Random, MAED and SSAED).
From Fig.3, we can see that SSAED performs better than both Random and MAED methods in almost all circumstances. Except for the ORL64 and Yale-B datasets with 20% and ORL32 dataset with 30% of data samples are selected as the training set, our novel algorithm is not lower than the MAED. The Random method performs the worst in all conditions (see three subfigures), this indirectly proves that active learning can reduce the cost of artificial marking. It is noteworthy that the SSAED algorithm performs extremely well in the MNIST, USPS and CBCL datasets. In these three datasets, all the classification accuracies with small size training samples are higher than 80%, especially, in the CBCL dataset, the accuracy even approximates 90% with 10% of training samples and overpasses 90% with 20% and 30% of training samples.

C. EFFICIENCY ANALYSIS OF THE THREE ALGORITHMS
For testing the operation efficiency of the SSAED algorithm, comparative experiments were implemented in the MNIST, USPS, ORL32, ORL64, Yale32, Yale64, Yale-B and CBCL datasets. Some details of these datasets have been presented in table 1. The running time of each algorithm in the nine datasets is presented in table 3. We can see that the Random method has the shortest running time, which is attributed to its simple structure that doesn't consider the intrinsic geometry of the sample data space. The running time of MAED and SSAED is relatively close, which due to the processing steps are similar. Both MAED and SSAED methods attach great importance to the geometric structure of data space. It is important to note that the dimensionality of the data set affects the processing times of the algorithms, the higher the dimensionality of the dataset, the longer the algorithms run, e.g., ORL32, ORL64, Yale32 and Yale64 datasets confirm this viewpoint (see table 1 and table 3). Theoretically, due to the SSAED algorithm adopts the semi-supervised strategy to construct the special geometric structure graph, which has to consider both labeled samples and unlabeled samples information. Hence, it needs to take more time than the other two methods. Actually, the results in the table 3 confirm this viewpoint.
We have so far compared the three algorithms (Random, MAED and SSAED) on two handwritten digital image datasets (MNIST, USPS) and seven human face image datasets (PIE, ORL32, ORL64, Yale32, Yale64, Yale-B and CBCL). Many data suggest that the new algorithm can achieve higher classification accuracy. SSAED has some advantages over the MAED and Random. As is shown in Fig.2, Fig.3 and table 2, SSAED algorithm yields impressive results on the compared datasets. Two reasons could account for these phenomena. Firstly, the applied datasets always have partially labeled data samples, which make the SSAED method adequately takes account of the class information of these partially labeled samples. For the unlabeled samples, SSAED still can use them effectively, because the graph is constructed with the semi-supervised style. Secondly, many samples that come from diverse categories overlap with each other seriously. All these indicate that construct graph with semi-supervised manner has obvious merits [33]- [36]. Among the three methods, the Random method performs the worst all the time on all the image datasets. Both MAED and SSAED algorithms are much better than the Random. The main reason is that the active learning method spontaneously selects the most useful samples to label as the training set, which is helpful for achieving maximum efficiency [31]. However, in the Random method, the training set is constructed by choosing the samples randomly, which easily leads to putting the useless samples into the training dataset. Therefore, it is difficult to obtain a satisfactory classification accuracy. Our experimental results further prove that the active learning algorithm has great application value.

IV. CONCLUSION
As is shown in section two, a novel active learning method has been presented for data classification by incorporating the locally linear reconstruction coefficients matrix into the learning course, where the coefficients are computed with a semi-supervised manner. The new algorithm attaches great importance to the potential manifold structure of the data space. The classification accuracies which are on nine real-world datasets indicate that SSAED surpasses both the MAED and Random methods, especially under the condition that only a few examples are chosen to use as the training dataset. However, it still has some shortcomings in this research. One weakness of the novel approach is that the experimental result is not satisfying in all the image datasets. The new method is only sensitive to some datasets, but not to all. In the future, we need to design a universal approach that can perform well in more datasets. Additionally, the key improvement of our algorithm is the locally linear reconstruction frame, which comes from the LLE. We adopt this frame to calculate the reconstruction coefficients that are used to construct the novel graph, but it is difficult to determine which constructive graph is the best. Thereby, we will try to use other methods, such as sparse reconstruction with the semisupervised or supervised manner, to construct the new graph in the next classification study.