Conferences >ICASSP 2025 - 2025 IEEE Inter...

DiffuseFIST: A Fast Image-guided Style Transfer Method for Adapting Large-scale Diffusion Models

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Pre-trained text-to-image (T2I) synthesis diffusion models (DM) have shown remarkable capabilities in generating diverse images. However, they struggle to satisfy the use...Show More

Metadata

Abstract:

Pre-trained text-to-image (T2I) synthesis diffusion models (DM) have shown remarkable capabilities in generating diverse images. However, they struggle to satisfy the user’s requirements due to (i) text’s inherent imprecision in expressing specific styles and (ii) generation is time-consuming due to many iterations in reverse process of diffusion models. To address these issues, we propose a fast style transfer method adopting pre-trained large-scale diffusion models, dubbed as DiffuseFIST, which adds T-small (300) noise to accelerate reverse process and solely requires real-world images and artistic images as input. Specifically, to preserve content and prevent style leakage, we introduce Content Injection (CI) strategy to achieve fine-grained control over the generated structure by manipulating spatial features and self-attention inside the model. Furthermore, we design Iterative Style Guidance (ISG) strategy which allows explicit user guidance and control of stylization tradeoffs. Finally, we initialize latent variable with Whitening and Coloring Transform (WCT) to deal with the disharmonious color. Qualitative and quantitative experiments demonstrate that our proposed method surpasses state-of-the-art methods in both conventional and diffusion-based style transfer methods.

Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 06-11 April 2025

Date Added to IEEE Xplore: 07 March 2025

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP49660.2025.10889203

Conference Location: Hyderabad, India

Funding Agency:

Contents

I. Introduction

The popularity of powerful DM [1]–[3] has led to remarkable progress in the field of content generation. For instance, T2I models [37], [38] are capable of generating diverse and vivid images with guidance of text prompts. However, textual descriptions are often less expressive and informative than visual representations of styles, such as a rough description only of the material (e.g., "oil", "watercolor" or "sketch"), art movement(e.g., "Impressionism" or "Cubism"), or artist (e.g., "Vincent van Gogh" or "Claude Monet"), which cannot fully evoke the artist’s vibrant color, dramatic light, and rough yet vigorous brushwork [4].

References is not available for this document.

DiffuseFIST: A Fast Image-guided Style Transfer Method for Adapting Large-scale Diffusion Models

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

DiffuseFIST: A Fast Image-guided Style Transfer Method for Adapting Large-scale Diffusion Models

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

Authors

Figures

References

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?