Close category search window
 

Comparison of Four Performance Metrics for Evaluating Sampling Techniques for Low Quality Class-Imbalanced Data

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Folleco, Andres ; Florida Atlantic Univ., Boca Raton, FL ; Khoshgoftaar, T.M. ; Napolitano, A.

Erroneous attribute values can significantly impact learning from otherwise valuable data. The learning impact can be exacerbated by the class imbalanced training data. We investigate and compare the overall learning impact of sampling such data by using four distinct performance metrics suitable for models built from binary class imbalanced data. Seven relatively free of noise, class imbalanced software engineering measurement datasets were used. A novel noise injection procedure was applied to these datasets. We injected domain realistic noise into the independent and dependent (class) attributes of randomly selected instances to simulate lower quality measurement data. Seven well known data sampling techniques with the benchmark decision-tree learner C4.5 were used. No other related studies were found that have comprehensively investigated learning by sampling low quality binary class imbalanced data containing both independent and dependent corrupted attributes. Two sampling techniques (random undersampling and Wilson's editing) with better and more robust learning performances were identified. In contrast, all metrics concurred on the identification of the worst performing sampling technique (cluster-based oversampling).

Published in:
Machine Learning and Applications, 2008. ICMLA '08. Seventh International Conference on

Date of Conference: 11-13 Dec. 2008

Need Help?


IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2013 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.