Feature First: Advancing Image-Text Retrieval Through Improved Visual Features | IEEE Journals & Magazine | IEEE Xplore

Feature First: Advancing Image-Text Retrieval Through Improved Visual Features


Abstract:

Current image-text retrieval methods mainly utilize region features that provide object-level information to represent images, making the retrieval results more accurate ...Show More

Abstract:

Current image-text retrieval methods mainly utilize region features that provide object-level information to represent images, making the retrieval results more accurate and interpretable. However, there are several issues with region features, such as lack of rich contextual information, loss of object details and risk of detection redundancy. The ideal visual features in image-text retrieval should have three characteristics: object-level, semantically-rich, and language-aligned. To this end, we propose a novel visual representation framework to capture more comprehensive and powerful visual features. Specifically, since these region feature disadvantages are the grid feature advantages, we first build a two-step interaction model to explore the complex relationship between them from the spatial and semantic perspectives to integrate their complementary information, making the fused visual features both object-level and semantic-rich. Then, we design a text-integrated visual embedding module that utilizes textual information as guidance to filter redundant regions, further endowing visual features with language-aligned capabilities. Finally, we develop a multi-attention pooling module to better aggregate these enhanced visual features in a more fine-grained manner. Extensive experiments demonstrate that our proposed model achieves state-of-the-art performance on the benchmark datasets Flickr30K and MS-COCO.
Published in: IEEE Transactions on Multimedia ( Volume: 26)
Page(s): 3827 - 3841
Date of Publication: 15 September 2023

ISSN Information:


Contact IEEE to Subscribe

References

References is not available for this document.