Loading [MathJax]/extensions/MathMenu.js
DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance | IEEE Conference Publication | IEEE Xplore

DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance


Abstract:

Image-to-video generation, which aims to generate a video starting from a given reference image, has drawn great attention. Existing methods frequently integrate semantic...Show More

Abstract:

Image-to-video generation, which aims to generate a video starting from a given reference image, has drawn great attention. Existing methods frequently integrate semantic information from images or simply concatenate images, which often leads to low fidelity and flickering in the generated videos. To tackle these problems, we propose a high-fidelity image-to-video generation method by devising a frame retention branch based on a pre-trained video diffusion model, named DreamVideo. Our DreamVideo perceives the reference image via convolution layers and concatenates the features with the noisy latents as model input. By this means, the details of the reference image can be preserved to the greatest extent. In addition, by incorporating the designed double-condition classifier-free guidance, DreamVideo can generate high-quality videos of different actions by providing varying prompt texts. We conduct comprehensive experiments on the public datasets, and both quantitative and qualitative results indicate that our method outperforms the state-of-the-art method.
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information:

ISSN Information:

Conference Location: Hyderabad, India

Contact IEEE to Subscribe

References

References is not available for this document.