Loading [MathJax]/extensions/MathMenu.js
Enhancing Image Generation Fidelity via Progressive Prompts | IEEE Conference Publication | IEEE Xplore

Enhancing Image Generation Fidelity via Progressive Prompts


Abstract:

Diffusion transformer (DiT) architecture catches much attention in image generation, which achieves better fidelity, performance, and diversity. However, most existing Di...Show More

Abstract:

Diffusion transformer (DiT) architecture catches much attention in image generation, which achieves better fidelity, performance, and diversity. However, most existing DiT-based image generation methods are global-aware synthesis and regional prompt control is less explored. In this paper, we propose a coarse-to-fine generation pipeline for regional prompt-following generation. Specifically, we first leverage the powerful large language model (LLM) to generate the high-level description of image (such as content, topic, and objects) and low-level description of image (such as details and style). Then we explore the influence of cross-attention layers in different depths. We discover that deeper layers always responsible for the high-level content control, while the shallow layers handles low-level content control. The various prompts are injected into the proposed regional cross-attention control in order for course-to-fine generation. Using the proposed pipeline, we improve the controllability of DiT-based image generation. Extensive quantitative and qualitative results demonstrate that our pipeline enables to improve the generated performance. Our codes are available at https://github.com/ZhenXiong-dl/ICASSP2025-RCAC.
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information:

ISSN Information:

Conference Location: Hyderabad, India

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.