By Topic

Generating Synthetic Data to Match Data Mining Patterns

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Eno, J. ; Univ. of Arkansas, Little Rock, AK ; Thompson, C.W.

Synthetic data sets can be useful in a variety of situations, including repeatable regression testing and providing realistic - but not real - data to third parties for testing new software. Researchers, engineers, and software developers can test against a safe data set without affecting or even accessing the original data, insulating them from privacy and security concerns as well as letting them generate larger data sets than would be available using only real data. Practitioners use data mining technology to discover patterns in real data sets that aren't apparent at the outset. This article explores how to combine information derived from data mining applications with the descriptive ability of synthetic data generation software. Our goal is to demonstrate that at least some data mining techniques (in particular, a decision tree) can discover patterns that we can then use to inverse map into synthetic data sets. These synthetic data sets can be of any size and will faithfully exhibit the same (decision tree) patterns. Our work builds on two technologies: synthetic data definition language and predictive model markup language.

Published in:

Internet Computing, IEEE  (Volume:12 ,  Issue: 3 )