An Efficient Multi-Label SVM Classification Algorithm by Combining Approximate Extreme Points Method and Divide-and-Conquer Strategy

Excessive time complexity has severely restricted the application of support vector machine (SVM) in large-scale multi-label classification. Thus, this paper proposes an efficient multi-label SVM classification algorithm by combining approximate extreme points method and divide-and-conquer strategy (AEDC-MLSVM). The AEDC-MLSVM classification algorithm firstly uses the approximate extreme points method to obtain the representative set from the multi-label training data set. While persisting almost all the useful information of multi-label training set, representative set can effectively reduce the scale of multi-label training set. Secondly, to acquire an efficient multi-label SVM classification model, the SVM based on the improved divide-and-conquer strategy is trained on the representative set, which will further improve the training speed and classification performance. The improvement is reflected in two aspects. (1) The improved divide-and-conquer strategy is applied to divide the representative set into subsets and this can ensure that each representative subset contains a certain number of positive and negative instances. This will avoid singular problems and overcome computation load imbalance problem. (2) The different error cost (DEC) method is applied to overcome the label imbalance problem. Effective experiments have proved that the training and testing speed of AEDC-MLSVM classification algorithm can be accelerated substantially while ensuring the classification performance.


I. INTRODUCTION
Compared with traditional binary or multi-class classification, multi-label classification is different in that each instance can have multiple labels and thus these labels are no longer mutually exclusive [1]. Many methods have been proposed to solve the multi-label classification problem, including SVM method, decision tree method, neural network method, K-nearest neighbor method, etc [2]. These methods have been widely recognized and successfully solved many real-world practical problems, such as image video semantic annotation [3], [4], text categorization [5], music emotion classification [6], bioinformatics prediction [7] and so on.
The associate editor coordinating the review of this manuscript and approving it for publication was Huiling Chen .
With the arrival of big data era, many real-world applications need to be implemented in large-scale multi-label data sets. However, many existing multi-label classification methods cannot be applied to large-scale multi-label data sets effectively. The main reason for this problem is that these methods are severely restricted by the excessive time complexity. This phenomenon is especially evident in SVM. In this paper, we will focus on the research of efficient multi-label SVM classification methods.
SVM [8] is an extraordinary well-known machine learning method, which has been applied successfully in face detection, handwritten recognition, text categorization, etc [9]. Traditional SVM only can solve the single-instance singlelabel classification problem, but the improved SVM algorithms, such as Rank-SVM [10] algorithm, can be applied in VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ multi-label classification. However, many real-world multilabel data sets are non-linear. Hence, to obtain competitive performance, SVM needs non-linear kernel to train these multi-label data sets, which further limits the use of multilabel SVM classification algorithm in large-scale data sets. In addition, the multi-label SVM classification algorithm cannot avoid the problem that the vast majority of multilabel data sets are suffering from a serious label imbalance problem [11], which will seriously affect the classification performance.
The main contributions of this paper are as follows: (1) The proposed AEDC-MLSVM classification algorithm can solve the problem that the application of multi-label SVM classification algorithm in large-scale data sets is seriously restricted by the excessive time complexity.
(2) The principle of the proposed SVM by combining approximate extreme points method and divide-and-conquer strategy (AEDC-SVM) is shown in FIGURE 2. The proposed AEDC-SVM can not only ensure the classification performance, but also reduce greatly the size of the training set and the negative impact of label imbalance problem, solve the computation load imbalance problem and prevent singular problems. This further improves the applicability of AEDC-MLSVM classification algorithm in large-scale data sets.
(3) The experimental results in three public real-world data sets show that the training and testing time of AEDC-MLSVM algorithm is the shortest compared with that of the existing multi-label classification algorithms such as ML-LIBSVM [12], ML-CVM [13] and ML-BVM [14]. And the performance of AEDC-MLSVM algorithm on the five evaluation indexes is pretty close to that of ML-LIBSVM and better than that of ML-CVM and ML-BVM.
The rest of this paper is organized as follows. Chapter 2 will introduce some related works. The new AEDC-MLSVM classification algorithm is proposed in chapter 3. After that, the analysis of the experiment results is presented in chapter 4. Chapter 5 is the summary of this paper.

II. RELATED WORK
From the first, multi-label classification has been widely concerned by experts in machine learning, pattern recognition, statistics and so on. In view of different practical problems, various kinds of multi-label classification methods have been proposed and achieved good effect. These multi-label classification methods can be summarized as the following two main strategies: problem transformation strategy and algorithm adaptation strategy [2]. Moreover, many methods have been proposed to solve the problem of label imbalance in multi-label classification. This chapter will firstly introduce the existing multi-label classification methods according to the two strategies, and then introduce current methods of processing the label imbalance problem.
The problem transformation strategy is mainly to transform a multi-label classification problem into several single-label classification problems. As a result, this type of multilabel classification methods is mainly achieved by the combination of problem transformation skill and existing single-label classification methods. Problem transformation skills mainly contain binary relevance (BR), one-by-one (OBO), one-versus-one (OVO) and label powerset (LP) [2], etc. Frequently-used single-label classification methods contain SVM, decision tree, neural network, nearest neighbor and so on [2].
In [15], three main defects of BR problem transformation strategy are described. First of all, since it assumes that the labels are independent, dependencies among labels will not be exploited. Secondly, it will likely cause the label imbalance problem. Finally, as the number of labels increases, the label imbalance problem will aggravate with the increase of classifiers. Despite of above problems, the BR problem transformation strategy is still considered to be simple and practical, and the data set can be reconstructed. In [16], the author highlights its superiority. Firstly, any single-label classifier can be used as the base classifier to accomplish the multi-label classification. Secondly, its complexity is lower than other methods, and its complexity is linear with the number of labels. Thirdly, because of the independence among labels, it can be easily parallelized. Finally, the advantage that needs to be emphasized is that it can optimize multiple loss functions. Thus, this paper will use the famous BR problem transformation strategy to accomplish multi-label classification.
Algorithm adaptation strategy accomplishes multi-label classification by improving single-label classification algorithms. By improving the information entropy formula and setting the leaf nodes as a label set, C4.5 type multi-label classification algorithm is proposed in [17]. This multilabel classification algorithm is suitable for small-scale data sets. Rank-SVM [10] algorithm accomplishes multi-label classification by minimizing the ranking loss of multi-class SVM, which will cause an extremely complex quadratic programming problem. To overcome the high time complexity problem of Rank-SVM algorithm, Rank-CVM [18] and Rank-CVMz [19] algorithms are proposed by adopting core vector machine (CVM) and zero label, which can improve the training speed to a certain extent, but reduce the classification effect. By combining the advantages of Ranking support vector machine and Binary Relevance with robust Low-rank learning, RBRL [32] algorithm is proposed. This algorithm can solve the optimization problem efficiently by adopting two accelerated proximal gradient (APG) methods. The ML-SLSTSVM [33] algorithm improves the MLTSVM algorithm by introducing structural information of training instances and adopting least square method. Although RBRL and ML-SLSTSVM algorithms can improve the classification performance, it can only be applied to small-scale data sets. ML-KNN [20] algorithm which is based on the k nearest neighbor (KNN) can achieve label prior probability and conditional probability by independently using the discrete binary bayes rule for each label. The BP-MLL classification algorithm [21] can express the multi-label features by constructing a new empirical loss function. These algorithms are difficult to be applied to large-scale data sets.
In [22], the influence of label imbalance problem on various classification algorithms is introduced in detail. In order to overcome this problem, many countermeasures have been proposed and have achieved good effect. These countermeasures can be summarized as the following three mainstream methods: resampling method [22], instance-based method [23] and cost sensitive method [24]. The DEC method adopted in this paper is a specific implementation of cost sensitive method.
To sum up, the use of existing multi-label classification algorithms in large-scale data sets is seriously limited by heavy time complexity and this phenomenon is severer for these algorithms based on SVM. The AEDC-MLSVM classification algorithm proposed in this paper will be a good solution to this problem. It not only can shorten the time consumption of training and testing greatly, but also achieve the classification performance close to that of ML-LIBSVM classification algorithm, and better than that of ML-CVM and ML-BVM classification algorithms. In addition, the DEC method is adopted to reduce the impact of label imbalance problem.

III. AN EFFICIENT MULTI-LABEL SVM CLASSIFICATION ALGORITHM BY COMBINING APPROXIMATE EXTREME POINTS METHOD AND DIVIDE-AND-CONQUER STRATEGY
In this chapter, we will firstly introduce the BR problem transformation strategy. Secondly, we will explain the principle of approximate extreme points method. Thirdly, the AEDC-SVM is elaborated in detail. Fourthly, we will design and implement the AEDC-MLSVM classification algorithm. Finally, we will analyze the time and space complexity of the AEDC-MLSVM classification algorithm.

A. BINARY RELEVANCE PROBLEM TRANSFORMATION STRATEGY
N } represents a multi-label training data set, x i represents a data vector of d feature values, Q = {q 1 , · · · , q k } represents the label set. First of all, the BR problem transformation strategy is to transform the multi-label training data set D into k independent binary training subsets, i.e. D q j = regarded as a positive training instance of D q j , i.e. y i = 1 and a negative training instance otherwise, i.e. y i = −1. After that, by training on each binary training subset D q j , corresponding binary classifier h q j (x) is constructed. Finally, the BR problem transformation strategy integrates the results of the k binary classifiers to realize multi-label classification, and its formula is as below.
In order to utilize the BR problem transformation strategy to realize multi-label classification effectively, the following decision function is used to integrate all binary classification results. The decision function is as below.
Meanwhile, the following rule is used to avoid obtaining an empty relevant label set. The equation is as below.
From chapter 2, we know that the BR problem transformation strategy is effective and practical, and it can be effectively applied to large-scale multi-label classification. Consequently, the BR problem transformation strategy will be used to realize multi-label classification.

B. THE PRINCIPLE OF APPROXIMATE EXTREME POINTS METHOD
Before explaining the principle of approximate extreme points method, we convert D q j (defined in the previous sub- Since the approximate extreme points method is based on the extreme points principle, we will firstly introduce the extreme points principle [25], [26]. It can be seen from FIGURE 1 that any vector x i in X q j can be represented by a convex combination vector set EP(X q j ), and its formula is as below.
where 0 ≤ β i,t ≤ 1, and It can be seen from formula 4 that any vector x i in X q j can be obtained only by using EP(X q j ) and convex combination weight parameter set{β i,t }. Therefore, we define EP(X q j ) as the extreme points set of X q j . EP(X q j ) not only contains almost all the important information of X q j , but also its quantity is far less than that of X q j . Thus training SVM on EP(X q j ) can greatly improve the training and testing speed on the basis of ensuring the classification performance. However, when facing large-scale training data set, the solution complexity of extreme points method is high. For this reason, the approximate extreme points method is proposed. Before that, we assume that the kernel space transformation set of we assume that N can be divided by |A l |. When i = j, for ∀a i , a j ∈ A l , we can obtain y i = y j . Here, |A l | represents the number of training instances in A l . A lg denotes an arbitrary subset of A l , i.e., A lg ⊆ A l . For ∀a i ∈ A l , the following formula is obtained according to the extreme points principle.
A * l can be defined as an approximate extreme points set of A l ,if it satisfies the following formula.
Therefore, the representative set A * of A can be obtained as follows.
We can get the representative set of X q j as X q * is much smaller than that of X q j , and it contains almost all important information of X q j . The time complexity of representative set solution is linear with the size of X q j .

C. SVM BY COMBINING APPROXIMATE EXTREME POINTS METHOD AND DIVIDE-AND-CONQUER STRATEGY
The obtained X q * j and its corresponding label set Y q * j are applied to train SVM. The primal optimization problem of SVM can be transformed into the following quadratic optimization problem, the equation is as below.
Equation 8 is a standard C-SVM model. The parameter C is used to balance the model complexity and the sum of losses of training data set, α ∈ R M is the vector of dual variables.
x f ) is the kernel function. By solving equation 8, we can get the optimal solution α * . Then we search for the α * t of α * in section (0, C) to calculate b * , the equation is as below.
Finally, we construct decision function h q j (x) to realize classification and the equation is as below.
Although SVM using approximate extreme points method has good classification performance, its use in large-scale data sets will be still restricted by excessive computational complexity. In [27], the author proposed the DC-SVM algorithm, in which the divide-and-conquer strategy is used and the training speed is improved greatly. But it has the following problems: firstly, the whole problem is partitioned by the unsupervised kernel kmeans clustering method, which will easily lead to the singular problems; secondly, it is difficult to balance the computation among subproblems. Therefore, we will propose an algorithm in which the approximate extreme points method and divide-and-conquer strategy are combined to improve SVM, namely AEDC-SVM. FIGURE 2 shows the improvement of AEDC-SVM. The steps are as follows.
(1) AEDC-SVM uses the approximate extreme points method to obtain the representative set X q * j and its corresponding label set Y q * j .
(2) Divide X q * j into X q * + j and X q * − j according to the positive and negative labels.
(5) Each combined representative subset V v can be trained on SVM efficiently and independently with the following equation.
where v = {1, · · · , w}, α (v) denotes the sub-vector composed by All subproblem solutions are integrated to initialize an approximate whole solutionᾱ = ᾱ (1) , · · · ,ᾱ (w) .ᾱ (v) is the optimal solution for the v-th subproblem. Above method can overcome the computation load imbalance problem and avoid singular problems, and improve the classification performance effectively.
Although the proposed AEDC-SVM can reduce the computational complexity and has good performance when facing large-scale data sets, it cannot solve the multi-label classification problem and label imbalance issue. When facing the label imbalance issue, AEDC-SVM tends to treat each instance as negative instance. Therefore, the result has skewness. Setting different punishment parameters to two kinds of instances can solve the label imbalance issue. A larger value is set to C when facing few positive instances, which means more attention is paid to the positive instance and the misclassification of positive instance will be punished rigorously. This is the idea of DEC method. Based on DEC method, we improve the AEDC-SVM to solve the label imbalance issue. We improve the original AEDC-SVM optimization problem according to the following equation.
where C + and C − denote different punishment parameters. It can be seen from formula 12 that by selecting different penalty parameters C + and C − for two kinds of instances, the label imbalance problem can be effectively solved.

D. DESIGN AND IMPLEMENTATION OF AEDC-MLSVM CLASSIFICATION ALGORITHM
The proposed AEDC-MLSVM classification algorithm adopts the BR problem transformation strategy to implement multi-label classification. Firstly, it transforms the multi-label training data set into k binary training data sets based on the number of labels. Each binary training data set is composed of positive instances and negative instances, and the number of instances of each binary training data set is the same as that in the multi-label training data set.
Secondly, for each binary training data set, we adopt AEDC-SVM method to get its classifier h q j (x). The main steps are as follows.
Step 1: The representative set is obtained by using the approximate extreme points method.
Step 2: The representative set is divided into positive representative set and negative representative set according to positive and negative labels.
Step 3: m training instances are chosen randomly from the positive representative set. Then the kernel kmeans algorithm is run on the m training instances and w positive cluster centers are constructed in the kernel space. After that, the w cluster centers are used to separate the positive representative set into w positive subsets.
Step 4: m training instances are chosen randomly from the negative representative set. Then the kernel kmeans algorithm is run on the m training instances and w negative cluster centers are constructed in the kernel space. After that, the w cluster centers are used to separate the negative representative set into w negative subsets.
Step 5: According to the distance between positive and negative clustering centers from near to far, w representative subsets are obtained by combining the positive and negative subsets. Each representative subset contains positive and negative instances.
Step 6: Each representative subset can be trained on the improved LIBSVM algorithm and the vector of dual variables are obtained, i.e.,ᾱ (v) represents the optimal solution for the v-th representative subset. The improved LIBSVM is used because it is the combination of DEC method and SMO algorithm. VOLUME 8, 2020 Step 7: The vector of dual variables of representative set is obtained by integrating each representative subsetᾱ (v) , i.e., α = ᾱ (1) , · · · ,ᾱ (w) . Then the classifier h q j (x) is obtained.
Finally, we integrate the results of each classifier h q j (x) by formulas 2 and 3, and efficient multi-label classification is achieved. The pseudocode of AEDC-MLSVM classification algorithm is shown in Algorithm 1.

Algorithm 1 AEDC-MLSVM Classification Algorithm
Input : D training data set{(x i , Y i ) |i = 1, · · · , N }; Q the set of all labels {q 1 , q 2 , · · · , q k }; x testing data {x ∈ R d }; k total number of labels; P maximum size of subsets after first level partition; V maximum size of subsets after second level partition; ε minimal positive real constant; β positive real constant; w number of cluster centers; Output: Y the prediction label set of x begin 1) Transform multi-label training data set D into k binary training data sets with the BR problem transformation strategy, i.e., D q 1 , D q 2 , · · · , D q k . 2) for each binary training data set obtained D q j (q j ∈ Q, j = 1, 2, · · · , k) do respectively; [2] Run the kernel kmeans algorithm on m instances to construct w cluster centers,i.e., {c + 1 , · · · , c + w }. Use the w cluster centers to separate D q * + j into w subsets, i.e. {V + 1 , · · · , V + w }; [3] Run the kernel kmeans algorithm on m instances to construct w cluster centers,i.e., {c − 1 , · · · , c − w }. Use the w cluster centers to separate D q * − j into w subsets, i.e.
According to the distance from near to far, obtain w representative subsets of mutual exclusion, i.e.
Use LIBSVM (V v ,β) to obtain the optimal solution, i.e.,ᾱ (v) , with the formulas of 11 and 12. end (f) Get h q j (x) of D q * j according toᾱ = ᾱ (1) , . . . ,ᾱ (w) and the formulas of 9 and 10; end Through the introduction of AEDC-SVM in the previous subchapter, we can expect that the AEDC-MLSVM classification algorithm with non-linear kernel can have good performance in large-scale data sets. It will shorten the training and testing time. Meanwhile, the performance ofAEDC-MLSVM classification algorithm is similar to that of ML-LIBSVM.

E. TIME AND SPACE COMPLEXITY ANALYSIS OF AEDC-MLSVM CLASSIFICATION ALGORITHM
We know that the training time complexity of standard SVM classification algorithm is O (N 3 ), and its space complexity is O (N 2 ), where N represents the size of the training data set. The time complexity of obtaining representative set with approximate extreme points method is O(kN ). Because there are M /w dual variables in formulas 11 and 12, the time complexity of AEDC-MLSVM classification algorithm is at least O(kM 2 /w), and its space complexity is O(kM 2 /w 2 ), where k represents the number of labels, w represents the number of clustering centers, and M represents the size of representative set and it is far less than N . Therefore, the time-space complexity of AEDC-MLSVM classification algorithm will be greatly reduced, and it can be well applied to large-scale multi-label data sets.

IV. EXPERIMENTS A. DESCRIPTION OF THREE PUBLIC REAL-WORLD DATA SETS
To confirm the effectiveness of the proposed AEDC-MLSVM classification algorithm, we will conduct experiments in three public real-world data sets. TMC2007-500 is a text data set, in which each instance represents an aviation safety report, and each label represents a type of safety issues described in the aviation safety report. In this data set, each aviation safety report may contain multiple types of safety issues, that is, multiple labels. Mediamill(exp1) is a video data set in which each instance represents a video and each label represents an annotation concept. In this data set, each video can contain multiple annotation concepts, that is, multiple labels. Eukary-oteGO is a bioinformatics data set, in which each instance represents an protein sequence, and each label represents a type of sub-cellular location. In this data set, each protein sequence may contain multiple types of sub-cellular location, that is, multiple labels. These data sets can be obtained from public websites [28], and the detailed descriptions are shown in TABLE 1.

B. THREE COMPARABLE MULTI-LABEL CLASSIFICATION ALGORITHMS
To verify the advantages of AEDC-MLSVM classification algorithm, we select three comparable multi-label classification algorithms, i.e., ML-LIBSVM [12], ML-CVM [13] and ML-BVM [14]. These algorithms are implemented by combining the BR problem transformation skills and existing single-label algorithms, i.e., LIBSVM, CVM and BVM. As the benchmark algorithm of experiments, ML-LIBSVM algorithm can achieve good classification performance, but its training and testing time complexity is too high. ML-CVM and ML-BVM algorithms are commonly used to improve the training efficiency of multi-label classification. These algorithms have been applied to many practical problems and achieved good results.

C. FIVE COMMON EVALUATION INDEXES
Because of the characteristics of multi-label classification, its evaluation index is more complex than that of single-label classification. At present, many multi-label classification evaluation indexes have been used [29]- [31]. Five common evaluation indexes are chosen to evaluate the experimental results. They are coverage, ranking loss, hamming loss, oneerror and average-precision.
(1) Coverage: it is applied to evaluate how many steps are needed, on average, to move along the ranked label list in order to get all the relevant labels of an instance. This evaluation index is computed as follows.
here, r i l j represents the rank position of label l j in the label set L.
(2) Ranking loss: it is applied to evaluate the average of pairs of labels that are misordered for the instance. This evaluation index is computed as follows. (14) here,Ȳ i represents the irrelevant label set of x i , |Ȳ i | represents the number of the irrelevant labels for x i .

Ranking loss
(3) Hamming loss: it is applied to evaluate how many times, on average, an instance-label pair is misclassified, i.e., an irrelevant is predicted or a relevant label is not in the prediction result. This evaluation index is computed as follows.

Hamming loss
here, Dif (Y i , Z i ) represents the symmetric difference for Y i and Z i . (4) One-error: it is applied to evaluate how many times the top-ranked label is not in the possible label set. This evaluation index is computed as follows.
δ arg min l j ∈L r i l j (16) here, arg min l j ∈L r i l j represents the top-ranked label of x i . If arg min l j ∈L r i l j / ∈ Y i , then δ arg min l j ∈L r i l j = 1, otherwise 0.
(5) Average-precision: it is applied to evaluate the average fraction of relevant labels ranked higher than a particular label. This evaluation index is computed as follows.
The characteristics of these evaluation indexes are shown in TABLE 2. In the value indication column, ↓ represents that the smaller the value, the better the multi-label classification performance. ↑ represents that the larger the value, the better the multi-label classification performance.

D. EXPERIMENTAL SETUP AND RESULT ANALYSIS
In this experiment, the radial basis function, i.e., K (x, y) = exp(−γ x − y 2 2 ) is used in the proposed AEDC-MLSVM classification algorithm and the other three comparable multilabel classification algorithms. Symbol γ indicates the scale factor of kernel and symbol · 2 indicates the Euclidean distance. In order to obtain the optimal representative set, parameters P, V and ε need to be set in AEDC-MLSVM classification algorithm. And to obtain the optimal solution of the divide-and-conquer strategy, parameters w and β also need to be set. The meaning of the five parameters refers to Algorithm 1. In addition, two parameters, i.e., judgment basis of allowing termination e and loss function punishment parameter C need to be set in the four multi-label classification algorithms. For different data sets, above parameters are obtained through cross validation. The experiment is run on a computer with Intel i7-8565U CPU and 8GB RAM.
Through cross validation, the parameter settings of data set TMC2007-500 are as follows, P = 30, V = 15, ε = 1.7, w = 2, β = 3, e = 1.95e −5 and C = 4. TABLE 3 and 4 show the experimental results of four different multi-label classification algorithms in this data set.  indexes is pretty close to that of ML-LIBSVM, but its training and testing time only accounts for 27.9% and 23.4% of ML-LIBSVM's training and testing time respectively. At the same time, the performance of AEDC-MLSVM on five common evaluation indexes is much better than that of ML-CVM and ML-BVM, especially on average-precision, the value rises at least 15.3%, and its training and testing time is less than that of ML-CVM and ML-BVM.
Through cross validation, the parameter settings of data set Mediamill(exp1) are as follows, P = 50, V = 30, ε = 0.2, w = 2, β = 3.3, e = 1.95e −5 and C = 8. TABLE 5 and 6 show the experimental results of four different multi-label classification algorithms in this data set.  It can be seen from TABLE 5 and 6 that the performance of AEDC-MLSVM on five common evaluation indexes is similar to that of ML-LIBSVM, but its training and testing time only accounts for 10.3% and 11.2% of ML-LIBSVM's training and testing time respectively. At the same time, the performance of AEDC-MLSVM on the indexes of Hamming loss, One-error and Average-precision is much better than that of ML-CVM and ML-BVM, especially on average-precision, the value rises at least 21.1%, and its training and testing time is less than that of ML-CVM and ML-BVM.
Through cross validation, the parameter settings of data set EukaryoteGO are as follows, P = 30, V = 14, ε = 3.34, w = 2, β = 12, e = 1e −5 and C = 1/4.   From TABLE 7 and 8, it can be seen that the performance of AEDC-MLSVM on five common evaluation indexes is pretty close to that of ML-LIBSVM, but its training and testing time only accounts for 23.5% and 9.5% of ML-LIBSVM's training and testing time respectively. At the same time, the performance of AEDC-MLSVM on five common evaluation indexes is much better than that of ML-CVM and ML-BVM, especially on average-precision, the value rises at least 28.6%, and its training and testing time is less than that of ML-CVM and ML-BVM.

V. CONCLUSION
In this paper, to solve the problem that the application of multi-label SVM classification algorithm in large-scale data sets is seriously restricted by heavy time complexity, AEDC-MLSVM classification algorithm is proposed. This algorithm improves the traditional multi-label SVM classification algorithm by combining approximate extreme points method and divide-and-conquer strategy, and the DEC method is used to deal with the label imbalance problem. All of these can greatly improve the applicability of this algorithm in large-scale multi-label data sets. The experimental results in three public real-world data sets show that the performance of AEDC-MLSVM algorithm is pretty close to that of ML-LIBSVM on the five commonly-used evaluation indexes, and superior to that of ML-CVM and ML-BVM. Its training and testing time is reduced greatly. We will further improve the classification performance of this algorithm by using the correlation information among labels in the future. She is currently an Assistant Professor with the School of Science and Information, Qingdao Agriculture University, Qingdao. Her main research interests include sensor networks and machine learning. VOLUME 8, 2020