Abstract:
With the massive growth of multimedia data, local devices gradually cannot meet the data processing needs, thus utilizing cloud server resources becomes better choice. To...Show MoreMetadata
Abstract:
With the massive growth of multimedia data, local devices gradually cannot meet the data processing needs, thus utilizing cloud server resources becomes better choice. To prevent privacy leakage, user data can only be stored in ciphertext. Existing cross-media retrieval schemes are only for plaintext data and cannot protect privacy. Privacy-preserving cross-modal retrieval of cloud data becomes a research priority. In this paper, we propose a CLIP-based privacy-preserving cross-modal retrieval scheme. First, CLIP encoder is utilized to transform images and texts into public space features to enhance retrieval accuracy. Subsequently, the original feature relevance is reduced by feature transformation network and mutual information loss to enhance privacy preservation. Finally, a transformer encoder-based hash learning method is used to embed the transformed features into compact hash codes, which is combined with comparison learning to enhance the discriminative properties. Experimental results on cross-modal datasets verify the efficiency and accuracy of the proposed scheme.
Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information: