Abstract:
Recent advances in text-to-image generation have made remarkable progress in synthesizing realistic human photos conditioned on given text prompts. However, existing per-...Show MoreMetadata
Abstract:
Recent advances in text-to-image generation have made remarkable progress in synthesizing realistic human photos conditioned on given text prompts. However, existing per-sonalized generation methods cannot simultaneously sat-isfy the requirements of high efficiency, promising identity (ID) fidelity, and flexible text controllability. In this work, we introduce PhotoMaker, an efficient personalized text-to-image generation method, which mainly encodes an arbitrary number of input ID images into a stack ID embed-ding for preserving ID information. Such an embedding, serving as a unified ID representation, can not only encap-sulate the characteristics of the same input ID comprehen-sively, but also accommodate the characteristics of differ-ent IDs for subsequent integration. This paves the way for more intriguing and practically valuable applications. Be-sides, to drive the training of our PhotoMaker, we propose an ID-oriented data construction pipeline to assemble the training data. Under the nourishment of the dataset constructed through the proposed pipeline, our PhotoMaker demonstrates better ID preservation ability than test-time fine-tuning based methods, yet provides significant speed improvements, high-quality generation results, strong gen-eralization capabilities, and a wide range of applications.
Date of Conference: 16-22 June 2024
Date Added to IEEE Xplore: 16 September 2024
ISBN Information: