Abstract:
The limitations of acquisition equipment often result in scene image data of limited size, posing a challenge for comprehensive analysis of social image datasets. Advance...Show MoreMetadata
Abstract:
The limitations of acquisition equipment often result in scene image data of limited size, posing a challenge for comprehensive analysis of social image datasets. Advances in generative models have introduced image outpainting techniques that expand the size of acquired social scene images, thereby enhancing the value of social image data. Stable diffusion (SD), which benefits from the guidance of caption prompts, shows excellent performance in image outpainting. However, its heavy reliance on manual prompts leads to a significant drawback: a decrease in the quality of generated images without prompts. To overcome this challenge, we propose a novel self-prompt diffusion model for image outpainting that extrapolates images based on the semantics of the source image, thereby removing the dependence on manual prompts. Specifically, we design a prompt autoencoder that uses an autoregressive transformer to map prompt embeddings into their semantic space, facilitating the construction of a semantic decoder. The semantic decoder and prompt embeddings are then cooptimized within the proposed prompt embedding network, allowing the mapping of image features to the stable diffusion prompt embeddings. Furthermore, by exploiting the inherent generative capabilities of diffusion models, we introduce a seam line regeneration mechanism to address the common problem of seam lines when splicing input and generated images. Comparative experiments on the Places2 and COCO datasets show that our method outperforms current state-of-the-art approaches on visual quality metrics and is adaptable to the stable diffusion model without additional fine-tuning.
Published in: IEEE Transactions on Computational Social Systems ( Early Access )