I. Introduction
The popularity of powerful DM [1]–[3] has led to remarkable progress in the field of content generation. For instance, T2I models [37], [38] are capable of generating diverse and vivid images with guidance of text prompts. However, textual descriptions are often less expressive and informative than visual representations of styles, such as a rough description only of the material (e.g., "oil", "watercolor" or "sketch"), art movement(e.g., "Impressionism" or "Cubism"), or artist (e.g., "Vincent van Gogh" or "Claude Monet"), which cannot fully evoke the artist’s vibrant color, dramatic light, and rough yet vigorous brushwork [4].