Abstract:
Hierarchical multilabel classification (HMC) allows an instance to have multiple labels residing in a hierarchy. A popular loss function used in HMC is the H-loss, which ...Show MoreMetadata
Abstract:
Hierarchical multilabel classification (HMC) allows an instance to have multiple labels residing in a hierarchy. A popular loss function used in HMC is the H-loss, which penalizes only the first classification mistake along each prediction path. However, the H-loss metric can only be used on tree-structured label hierarchies, but not on DAG hierarchies. Moreover, it may lead to misleading predictions as not all misclassifications in the hierarchy are penalized. In this paper, we overcome these deficiencies by proposing a hierarchy-aware loss function that is more appropriate for HMC. Using Bayesian decision theory, we then develop a Bayes-optimal classifier with respect to this loss function. Instead of requiring an exhaustive summation and search for the optimal multilabel, the proposed classification problem can be efficiently solved using a greedy algorithm on both tree-and DAG-structured label hierarchies. Experimental results on a large number of real-world data sets show that the proposed algorithm outperforms existing HMC methods.
Published in: 2012 IEEE 12th International Conference on Data Mining
Date of Conference: 10-13 December 2012
Date Added to IEEE Xplore: 17 January 2013
ISBN Information: