DiffuseFIST: A Fast Image-guided Style Transfer Method for Adapting Large-scale Diffusion Models | IEEE Conference Publication | IEEE Xplore

DiffuseFIST: A Fast Image-guided Style Transfer Method for Adapting Large-scale Diffusion Models


Abstract:

Pre-trained text-to-image (T2I) synthesis diffusion models (DM) have shown remarkable capabilities in generating diverse images. However, they struggle to satisfy the use...Show More

Abstract:

Pre-trained text-to-image (T2I) synthesis diffusion models (DM) have shown remarkable capabilities in generating diverse images. However, they struggle to satisfy the user’s requirements due to (i) text’s inherent imprecision in expressing specific styles and (ii) generation is time-consuming due to many iterations in reverse process of diffusion models. To address these issues, we propose a fast style transfer method adopting pre-trained large-scale diffusion models, dubbed as DiffuseFIST, which adds T-small (300) noise to accelerate reverse process and solely requires real-world images and artistic images as input. Specifically, to preserve content and prevent style leakage, we introduce Content Injection (CI) strategy to achieve fine-grained control over the generated structure by manipulating spatial features and self-attention inside the model. Furthermore, we design Iterative Style Guidance (ISG) strategy which allows explicit user guidance and control of stylization tradeoffs. Finally, we initialize latent variable with Whitening and Coloring Transform (WCT) to deal with the disharmonious color. Qualitative and quantitative experiments demonstrate that our proposed method surpasses state-of-the-art methods in both conventional and diffusion-based style transfer methods.
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information:

ISSN Information:

Conference Location: Hyderabad, India

Funding Agency:


I. Introduction

The popularity of powerful DM [1]–[3] has led to remarkable progress in the field of content generation. For instance, T2I models [37], [38] are capable of generating diverse and vivid images with guidance of text prompts. However, textual descriptions are often less expressive and informative than visual representations of styles, such as a rough description only of the material (e.g., "oil", "watercolor" or "sketch"), art movement(e.g., "Impressionism" or "Cubism"), or artist (e.g., "Vincent van Gogh" or "Claude Monet"), which cannot fully evoke the artist’s vibrant color, dramatic light, and rough yet vigorous brushwork [4].

Contact IEEE to Subscribe

References

References is not available for this document.