I. Introduction
Compared with conventional single-label classification, multi-label classification is more general in practice, since it allows one instance to have more than one label simultaneously. For example, an image of sea shore sunset can have the labels sun, rock, and sea at the same time. Because of its generality, multi-label classification has attracted a substantial amount of attention from researchers. Existing approaches on multi-label classification can be categorized into two main groups: problem transformation methods and algorithm adaptation methods [1]. A problem transformation method transforms a multi-label classification problem into multiple single-label classification problems. Two common problem transformation methods are Binary Relevance (BR) and Label Powerset (LP) ([2], [3]). In BR, labels are assumed to be independent ([4], [5], [6], [7]), meaning every label has its own single-label classifier. During testing, each single-label classifier determines whether its corresponding label should be selected, or it gives a confidence score for further judgment. In LP, different combinations of multiple labels are each considered their own single label. An apparent problem of this approach is that many generated labels are supported by very few instances. In contrast to problem transformation methods, algorithm adaptation methods extend existing single-label classification approaches to handle multi-label classification directly. Many single-label classifiers have been extended, such as Logistic regression ([8]), K-Nearest Neighbors (KNN) ([9], [10]), decision trees ([11]), and Support Vector Machines (SVM) ([12], [13], [14], [15]). Ensemble methods have also been applied for multi-label classification ([16], [17], [18], [19], [20]). Though such efforts exist, all of the previously mentioned methods have some drawbacks.