Abstract:
With the exponential surge in diverse multimodal data, traditional unimodal retrieval methods struggle to meet the needs of users seeking access to data across various mo...Show MoreMetadata
Abstract:
With the exponential surge in diverse multimodal data, traditional unimodal retrieval methods struggle to meet the needs of users seeking access to data across various modalities. To address this, cross-modal retrieval has emerged, enabling interaction across modalities, facilitating semantic matching, and leveraging complementarity and consistency between heterogeneous data. Although prior literature has reviewed the field of cross-modal retrieval, it suffers from numerous deficiencies in terms of timeliness, taxonomy, and comprehensiveness. This article conducts a comprehensive review of cross-modal retrieval’s evolution, spanning from shallow statistical analysis techniques to vision-language pretraining (VLP) models. Commencing with a comprehensive taxonomy grounded in machine learning paradigms, mechanisms, and models, this article delves deeply into the principles and architectures underpinning existing cross-modal retrieval methods. Furthermore, it offers an overview of widely used benchmarks, metrics, and performances. Lastly, this article probes the prospects and challenges that confront contemporary cross-modal retrieval, while engaging in a discourse on potential directions for further progress in the field. To facilitate the ongoing research on cross-modal retrieval, we develop a user-friendly toolbox and an open-source repository at https://cross-modal-retrieval.github.io.
Published in: Proceedings of the IEEE ( Volume: 112, Issue: 11, November 2024)
Funding Agency:
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Cross-modal Retrieval ,
- Heterogeneous Data ,
- Semantic Matching ,
- Deep Learning ,
- Deep Network ,
- Latent Space ,
- Semantic Similarity ,
- Hash Function ,
- Common Space ,
- Shared Space ,
- Binary Code ,
- Canonical Correlation Analysis ,
- Source Domain ,
- Self-supervised Learning ,
- Retrieval System ,
- Metric Learning ,
- Common Representation ,
- Retrieval Accuracy ,
- Multimodal Features ,
- Retrieval Results ,
- Metric Learning Methods ,
- Retrieval Performance ,
- Cross-modal Interactions ,
- Latent Dirichlet Allocation ,
- Generative Adversarial Networks ,
- Transformer Architecture ,
- Specific Scenarios ,
- Matrix Factorization Method ,
- Image Features
- Author Keywords
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Cross-modal Retrieval ,
- Heterogeneous Data ,
- Semantic Matching ,
- Deep Learning ,
- Deep Network ,
- Latent Space ,
- Semantic Similarity ,
- Hash Function ,
- Common Space ,
- Shared Space ,
- Binary Code ,
- Canonical Correlation Analysis ,
- Source Domain ,
- Self-supervised Learning ,
- Retrieval System ,
- Metric Learning ,
- Common Representation ,
- Retrieval Accuracy ,
- Multimodal Features ,
- Retrieval Results ,
- Metric Learning Methods ,
- Retrieval Performance ,
- Cross-modal Interactions ,
- Latent Dirichlet Allocation ,
- Generative Adversarial Networks ,
- Transformer Architecture ,
- Specific Scenarios ,
- Matrix Factorization Method ,
- Image Features
- Author Keywords