Loading [MathJax]/extensions/MathMenu.js
DCDL: Dual Causal Disentangled Learning for Zero-Shot Sketch-Based Image Retrieval | IEEE Journals & Magazine | IEEE Xplore

DCDL: Dual Causal Disentangled Learning for Zero-Shot Sketch-Based Image Retrieval


Abstract:

Zero-shot sketch-based image retrieval (ZS-SBIR) is a challenging task that hinges on overcoming the cross-domain differences between sketches and images. Previous method...Show More

Abstract:

Zero-shot sketch-based image retrieval (ZS-SBIR) is a challenging task that hinges on overcoming the cross-domain differences between sketches and images. Previous methods primarily address cross-domain differences by creating a common embedding space, improving final retrieval results. However, most previous approaches have overlooked a critical aspect: sketch-based image retrieval task actually requires only the cross-domain invariant information relevant to the retrieval. Irrelevant information (such as posture, expression, background, and specificity) may detract from retrieval accuracy. In addition, most previous methods perform well on traditional SBIR datasets but lack corresponding research on generalization and extensibility in the face of more diverse and complex data. To address these issues, we propose a Dual Causal Disentangled Learning (DCDL) for ZS-SBIR. This approach can mitigate the negative impact of irrelevant features by separating retrieval-relevant features in the latent variable space. Specifically, we constructed a causal disentanglement model using two Variational Autoencoders (VAE), each applied to the sketch and image domains, to obtain disentangled variables with exchangeable attributes. Our framework effectively integrates causal intervention with disentangled representation learning, enabling a clearer separation of cross-domain retrieval-relevant and intra-class irrelevant features, which can be recombined into new reconstructed samples. Concurrently, we designed a Dual Alignment Module (DAM), leveraging the accurate and comprehensive semantic features provided by a text encoder pre-trained on large-scale datasets to supplement semantic associations and align disentangled retrieval-relevant features. The Dual Alignment Module enhances the model's ability to generalize across diverse datasets by effectively aligning retrieval-relevant information from different domains. Extensive experiments demonstrate that our method achieves state-of...
Published in: IEEE Transactions on Multimedia ( Early Access )
Page(s): 1 - 16
Date of Publication: 18 February 2025

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe