Abstract:
This work describes how automated data generation integrates in a big data pipeline. A lack of veracity in big data can cause models that are inaccurate, or biased by tre...Show MoreMetadata
Abstract:
This work describes how automated data generation integrates in a big data pipeline. A lack of veracity in big data can cause models that are inaccurate, or biased by trends in the training data. This can lead to issues as a pipeline matures that are difficult to overcome. This work describes the use of a Generative Adversarial Network to generate sketch data, such as those that might be used in a human verification task. These generated sketches are verified as recognizable using a crowd-sourcing methodology, and finds that the generated sketches were correctly recognized 43.8% of the time, in contrast to human drawn sketches which were 87.7% accurate. This method is scalable and can be used to generate realistic data in many domains and bootstrap a dataset used for training a model prior to deployment.
Date of Conference: 11-14 December 2017
Date Added to IEEE Xplore: 15 January 2018
ISBN Information: