Loading [a11y]/accessibility-menu.js
Allowing Supervision in Unsupervised Deformable- Instances Image-to-Image Translation | IEEE Journals & Magazine | IEEE Xplore
Scheduled Maintenance: On Monday, 30 June, IEEE Xplore will undergo scheduled maintenance from 1:00-2:00 PM ET (1800-1900 UTC).
On Tuesday, 1 July, IEEE Xplore will undergo scheduled maintenance from 1:00-5:00 PM ET (1800-2200 UTC).
During these times, there may be intermittent impact on performance. We apologize for any inconvenience.

Allowing Supervision in Unsupervised Deformable- Instances Image-to-Image Translation


Abstract:

Replacing objects in images is a practical functionality of Photoshop, e.g., clothes changing. This task is defined as Unsupervised Deformable-Instances Image-to-Image Tr...Show More

Abstract:

Replacing objects in images is a practical functionality of Photoshop, e.g., clothes changing. This task is defined as Unsupervised Deformable-Instances Image-to-Image Translation (UDIT), which maps multiple foreground instances of a source domain to a target domain, involving significant changes in shape. Although previous works incorporate instance masks of source domain for instance shape indication, their translation still fails in shape because of inadequate utilization of shape information in masks. To mitigate this issue, we introduce an effective two-stage pipeline for UDIT called Mask-Guided Deformable-instances GAN++ (MGD-GAN++), which generates target masks in the first stage named Mask Morph and utilizes the masks to guide the synthesis of corresponding instances in the second stage named Mask-Guided Image Generation. To further provide sufficient supervision with existing unpaired datasets, an overall set of training schemes is proposed for the two stages of MGD-GAN++, coined as Aligned Supervision and Inpainting Supervision, respectively. Extensive experiments on four datasets demonstrate the significant advantages of our MGD-GAN++ over existing methods both quantitatively and qualitatively. Furthermore, our training time consumption is hugely reduced compared to the state-of-the-art.
Page(s): 5335 - 5349
Date of Publication: 18 December 2023

ISSN Information:

Funding Agency:


I. Introduction

Image-to-Image (I2I) translation aims to learn the mapping between the source and target domain, and begins to emerge as the proposal of Generative Adversarial Networks [2]. Since then, increasing attention has been paid to this task because several visual tasks could be transformed into I2I translation such as: style transfer [3], [4], super-resolution [5], portrait synthesis [6], [7], [8], label-to-image [9], [10] and image-inpainting [11]. Moreover, great progress has been made in recent years. For example, CycleGAN [12] proposes to exert cycle consistency on the generators during the training process. Furthermore, UNIT [3] extends the Coupled GAN [13] based on the assumption of a shared latent space. To meet the demand of generating diverse and multi-modal images, MUNIT [14], DRIT [15], etc. are introduced by recombining the disentangled image representation. It is noteworthy that the methods above only focus on transferring styles on the whole image without considering the characteristics of instances.

Contact IEEE to Subscribe

References

References is not available for this document.