Loading [a11y]/accessibility-menu.js
Generating Synthetic Data from Large Language Models | IEEE Conference Publication | IEEE Xplore

Generating Synthetic Data from Large Language Models


Abstract:

Data collection for studying social phenomena is not only costly but is also, at best, a time-consuming and tedious task. Therefore, tools that may ease the task of data ...Show More

Abstract:

Data collection for studying social phenomena is not only costly but is also, at best, a time-consuming and tedious task. Therefore, tools that may ease the task of data collection will speed up these studies and improve their efficiency. In this contribution, we argue that in some cases Large Language Models (LLMs) may serve as a tool to generate data for studying social phenomena. The rationale is that LLMs absorb a vast amount of data from various types and sources; and embed (an abstraction of) the data in models. Querying these models generates synthetic data that can be considered as a good approximation of the data on which they are trained. The methodological and practical issues involved in our rationale are discussed in this paper. By means of a use case, we illustrate how synthetic data can be generated (or collected) from GPT and how the data can be used for studying stereotypical views on social groups.
Date of Conference: 14-15 November 2023
Date Added to IEEE Xplore: 25 December 2023
ISBN Information:

ISSN Information:

Conference Location: Al Ain, United Arab Emirates

Contact IEEE to Subscribe

References

References is not available for this document.