Cart (Loading....) | Create Account
Close category search window

Evaluating the impact of data quality on sampling

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Van Hulse, J. ; Dept. of Comput. & Electr. Eng. & Comput. Sci., Florida Atlantic Univ., Boca Raton, FL, USA ; Khoshgoftaar, T.M. ; Napolitano, A.

Three important data characteristics that can substantially impact a data mining project are class imbalance, poor data quality and the size of the training dataset. Data sampling is a commonly used method for improving learner performance when data is imbalanced. However, little effort has been put forth to investigate the performance of data sampling techniques when data is both noisy and imbalanced. In this work, we present a comprehensive empirical investigation of how data sampling techniques react to changes in four training dataset characteristics: dataset size, class distribution, noise level and noise distribution. We present the performance of four common data sampling techniques using 11 learning algorithms. The results, which are based on an extensive suite of experiments for which over 15 million models were trained and evaluated, show that data sampling can be very effective at dealing with the combined problems of noise and imbalance. In addition, the dataset characteristics which have the greatest impact on each of the data sampling techniques are identified.

Published in:

Information Reuse and Integration (IRI), 2010 IEEE International Conference on

Date of Conference:

4-6 Aug. 2010

Need Help?

IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2014 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.