Conferences >ICASSP 2025 - 2025 IEEE Inter...

Enhancing Image Generation Fidelity via Progressive Prompts

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Diffusion transformer (DiT) architecture catches much attention in image generation, which achieves better fidelity, performance, and diversity. However, most existing Di...Show More

Metadata

Abstract:

Diffusion transformer (DiT) architecture catches much attention in image generation, which achieves better fidelity, performance, and diversity. However, most existing DiT-based image generation methods are global-aware synthesis and regional prompt control is less explored. In this paper, we propose a coarse-to-fine generation pipeline for regional prompt-following generation. Specifically, we first leverage the powerful large language model (LLM) to generate the high-level description of image (such as content, topic, and objects) and low-level description of image (such as details and style). Then we explore the influence of cross-attention layers in different depths. We discover that deeper layers always responsible for the high-level content control, while the shallow layers handles low-level content control. The various prompts are injected into the proposed regional cross-attention control in order for course-to-fine generation. Using the proposed pipeline, we improve the controllability of DiT-based image generation. Extensive quantitative and qualitative results demonstrate that our pipeline enables to improve the generated performance. Our codes are available at https://github.com/ZhenXiong-dl/ICASSP2025-RCAC.

Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 06-11 April 2025

Date Added to IEEE Xplore: 07 March 2025

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP49660.2025.10889138

Conference Location: Hyderabad, India

Funding Agency:

Contents

References is not available for this document.

Enhancing Image Generation Fidelity via Progressive Prompts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Enhancing Image Generation Fidelity via Progressive Prompts

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Authors

Figures

References

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?