Abstract:
The ability to automatically generate realistic images from textual input is a challenging and important goal in artificial intelligence. In this research, a novel approa...Show MoreMetadata
Abstract:
The ability to automatically generate realistic images from textual input is a challenging and important goal in artificial intelligence. In this research, a novel approach is represented that combines RoBERTa a transformer-based language model, with Generative Adversarial Networks (GANs) to synthesize high-quality images from textual description. The proposed architecture uses RoBERTa model to encode the text input and Generative Adversarial Network that relates it to each pixel to produce an image that closely represents the input. The quality of the synthesized images is measured using Fréchet Inception Distance (FID) and Inception Score (IS) metrics. Three variants of the architecture have been proposed and demonstrated that this approach can produce realistic images. The results indicate that transformer-based language models can effectively be used with GANs for image synthesis, thus paving the way for further research in this area.
Published in: 2023 International Conference on Next-Generation Computing, IoT and Machine Learning (NCIM)
Date of Conference: 16-17 June 2023
Date Added to IEEE Xplore: 21 August 2023
ISBN Information: