By Topic

A Comparison of External Clustering Evaluation Indices in the Context of Imbalanced Data Sets

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

6 Author(s)
Marcilio C. P. de Souto ; Centro de Inf., Univ. Fed. de Pernambuco, Recife, Brazil ; AndrĂ© L. V. Coelho ; Katti Faceli ; Tiemi C. Sakata
more authors

For highly imbalanced data sets, almost all the instances are labeled as one class, whereas far fewer examples are labeled as the other classes. In this paper, we present an empirical comparison of seven different clustering evaluation indices when used to assess partitions generated from highly imbalanced data sets. Some of the metrics are based on matching of sets (F-measure), information theory (normalized mutual information and adjusted mutual information), and pair of objects counting (Rand and adjusted Rand indices). We also investigate the BCubed metric, which takes into account the concepts of recall, precision, as well as counting pairs. Furthermore, in order to avoid the class size imbalance effect, we propose a modification to the Rand index, referred to as the normalized class size Rand (NCR) index. In terms of results, apart from NCR, our experiments indicate that all the other analyzed indices are not able to deal properly with the problem of class size imbalance.

Published in:

2012 Brazilian Symposium on Neural Networks

Date of Conference:

20-25 Oct. 2012