Generative adversarial networks for increasing the veracity of big data | IEEE Conference Publication | IEEE Xplore

Generative adversarial networks for increasing the veracity of big data


Abstract:

This work describes how automated data generation integrates in a big data pipeline. A lack of veracity in big data can cause models that are inaccurate, or biased by tre...Show More

Abstract:

This work describes how automated data generation integrates in a big data pipeline. A lack of veracity in big data can cause models that are inaccurate, or biased by trends in the training data. This can lead to issues as a pipeline matures that are difficult to overcome. This work describes the use of a Generative Adversarial Network to generate sketch data, such as those that might be used in a human verification task. These generated sketches are verified as recognizable using a crowd-sourcing methodology, and finds that the generated sketches were correctly recognized 43.8% of the time, in contrast to human drawn sketches which were 87.7% accurate. This method is scalable and can be used to generate realistic data in many domains and bootstrap a dataset used for training a model prior to deployment.
Date of Conference: 11-14 December 2017
Date Added to IEEE Xplore: 15 January 2018
ISBN Information:
Conference Location: Boston, MA, USA

I. Introduction

The utility of automatic, realistic, synthetic data generation in big data problems has been demonstrated in both Velocity [1] and Variety [2]. Generative models are also frequently employed in big data settings for capturing trends within data [3], [4]. A generative model is one which models data as a distribution, or combination of distributions, which can then be sampled. In some cases, this ability to be sampled is a byproduct of the technique used [5]. In other cases, sampling this distribution is the goal [6], as obtaining new or unique data is critical to many applications.

Contact IEEE to Subscribe

References

References is not available for this document.