Abstract:
Diffusion models have recently become increasingly popular in a number of computer vision tasks, but they fail to achieve satisfactory results for unsupervised image-to-i...Show MoreMetadata
Abstract:
Diffusion models have recently become increasingly popular in a number of computer vision tasks, but they fail to achieve satisfactory results for unsupervised image-to-image translation, since they require massive training data and rely heavily on extra guidance. In this scenario, GANs can alleviate these issues existing in diffusion models, albeit with suboptimal quality. In this paper, we leverage the advantages of both GANs and diffusion models by training GANs with diffusion supervision in latent spaces (LaDiffGAN) to solve the unsupervised image-to-image translation task. Firstly, to promote style transfer quality, we encode the data in specific latent spaces with styles of the target and source domains. Secondly, we introduce the diffusion process with different amounts of Gaussian noise to enhance the modeling capability of GANs on the complex data distribution. We accordingly design a latent diffusion GAN loss to align the latent features between generated and training images. Lastly, we introduce a heterogeneous conditional denoising loss that incorporates image-level supervision to further improve the quality of generated results. Our LaDiffGAN significantly alleviates the drawbacks associated with diffusion models, such as data leakage, high inference cost, and high dependence on large training data sets. Extensive experiments show that LaDiffGAN outperforms previous GAN models and delivers comparable or even better performance than diffusion models.
Date of Conference: 17-18 June 2024
Date Added to IEEE Xplore: 27 September 2024
ISBN Information: